Selected article for: "database size and identification confidence"

Author: Mathias Kuhring; Joerg Doellinger; Andreas Nitsche; Thilo Muth; Bernhard Y. Renard
Title: An iterative and automated computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic samples
  • Document date: 2019_10_24
  • ID: k7hm3aow_1_1
    Snippet: eased search spaces is the application of multiple identification steps in general, independently of target application such as strain-level identification. These multi-stage identification search strategies are described by several different terms such as serial search 23 , multi-step 24 , iterative 25, 26 , multi-stage 27 , two-step [28] [29] [30] as well as cascade search 31 and they find application in proteomics, metaproteomics 26, 28 and pr.....
    Document: eased search spaces is the application of multiple identification steps in general, independently of target application such as strain-level identification. These multi-stage identification search strategies are described by several different terms such as serial search 23 , multi-step 24 , iterative 25, 26 , multi-stage 27 , two-step [28] [29] [30] as well as cascade search 31 and they find application in proteomics, metaproteomics 26, 28 and proteogenomics 30, 32 . Most of these strategies do not only overlap in their objective of increasing the identification rate or identification confidence, but share methodological principles as well. This includes the concept of identifying primarily unassigned spectra using databases of increasing complexity (for instance, by employing altered digestion parameters, additional post-translational modifications or additional spectral and genomic databases) [24] [25] [26] [27] 31 as well as the recurring theme of database size reduction 24, 26, [28] [29] [30] 32 . In addition, some methods rely on spectral quality assessment to enhance subsequent identification steps 25, 32 or exhibit a focus on algorithmic runtime reduction 24 . Apart from database size, multi-proteome databases present an additional challenge for taxonomic assignment in the form of high sequence similarity between proteomes, in particular between related species and strains. These similarities give rise to taxonomically ambiguous, widespread and false assignments and thus need to be accounted for. Several methods exist that provide various strategies to account for ambiguous peptide spectrum matches due to sequence similarity. Dworzanski et al. apply a proteogenomic mapping approach in combination with discriminant analysis to infer most likely bacterial assignments 33 . The proteogenomic approach was further developed in BACid that applies several statistical measures to account for similarity, including the comparison of ratios of taxonomic differences and of unique peptides to known error rates, i.e. the noise levels 34, 35 . BACid has since been applied to and extended for samples of unknown origin 36 as well as mixtures of bacteria 36, 37 . With the focus on metaproteomic analysis, MEGAN 38 , UniPept 39 , MetaProteomeAnalyzer 40,41 and TCUP 6 rely on lowest common ancestor or most specific taxonomy approaches to assign peptides spectrum matches to taxonomic levels. MiCId uses a clustering approach based on peptidome similarity of taxa on different levels in combination with unified E-values to infer statistically significant representatives of clusters as identified or classified microorganisms 13, 42 . Tracz et al. use a direct assignment strategy by concatenating all proteins of a proteome into one pseudo-polyprotein and considering only the top-scored spectrum matches to a peptide as counts for bacterial candidates 43 . In contrast, Pipasic explicitly makes use of ambiguous peptide spectrum matches (PSMs) by applying abundance correction based on intensityweighted proteome similarities of organisms in metaproteomic samples 44 . While these methods share the objective of reducing uncertainty in taxonomic identification due to ambiguous assignments in the context of the protein inference issue 45 , their application for untargeted strain-level identification is not straightforward for several reasons. For example, unique peptides are not necessarily available for all candidate strains in the sequence databases for which the

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1