Selected article for: "additional matrix and nr database"

Author: Mathias Kuhring; Joerg Doellinger; Andreas Nitsche; Thilo Muth; Bernhard Y. Renard
Title: An iterative and automated computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic samples
  • Document date: 2019_10_24
  • ID: k7hm3aow_26
    Snippet: The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. . https://doi.org/10.1101/812313 doi: bioRxiv preprint spectra matching to distinct proteins and proteomes remains a valuable parameter for strain differentiation when considering and weighing both, unique and the plethora of non-unique matches. TaxIt has a comparable computational runtime for small samples and databases (such as the viral data) despite usi.....
    Document: The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. . https://doi.org/10.1101/812313 doi: bioRxiv preprint spectra matching to distinct proteins and proteomes remains a valuable parameter for strain differentiation when considering and weighing both, unique and the plethora of non-unique matches. TaxIt has a comparable computational runtime for small samples and databases (such as the viral data) despite using a constrained search space. This is primarily a result of the additional strain proteome downloads, since the NCBI Entrez API is not designed and optimized for large scale downloads and proteins need to be fetched in numerous iterations of small chunks. However, the download overhead fades into the background when considering full bacterial samples such as bacillus all (Table 2 and Figure 5) and gives place to a runtime improvement of three quarters when compared to NCBI Blast NR database searches. In contrast, the runtime of Pipasic is afflicted with additional sequence comparisons necessary for constructing the similarity matrix that is highly influenced by increasing numbers of PSMs and taxa to compare. Finally, the memory footprint of TaxIt in comparison to the unique-PSMs-and Pipasic-based strategies remains constantly less for all samples, as would be expected when using substantially less proteins in the search databases. On a final note, strain-level identification performance is generally limited by the availability and integrity of taxa and proteomes in used databases. However, constantly increasing quality and quantity of the NCBI Taxonomy and Protein databases will induce constant improvement of strain-level identification strategies such as the presented iterative workflow.

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1