Selected article for: "genome sequence and input sequence"

Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data
  • Document date: 2020_1_30
  • ID: ac00tai9_57
    Snippet: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relati.....
    Document: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relatively low accuracy. On a real human virome sample, where a main source of negative (Moustafa et al., 2017) , our method filters out non-human viruses with high specificity. In this scenario, the BLAST-derived groundtruth labels were mined using the complete database (as opposed to just a training set). In all cases, our results are only as good as the training data used; high quality labels and sequences are needed to develop trustworthy models. Ideally, sources of error should be investigated with an in-depth analysis of a model's performance on multiple genomes covering a wide selection of taxonomic units. This is especially important as the method assumes no mechanistic link between an input sequence and the phenotype of interest, and the input sequence constitutes only a small fraction of the target genome without a wider biological context. Still, it is possible to predict a label even from those small, local fragments. A similar effect was also observed for image classification with CNNs (Brendel & Bethge, 2019) .

    Search related documents:
    Co phrase search for related documents
    • biological context and high quality: 1
    • case 10 and depth analysis: 1
    • case 10 and error rate: 1
    • case 10 and high quality: 1, 2
    • case 10 and high specificity: 1
    • complete database and error rate: 1, 2
    • complete database and high quality: 1, 2, 3, 4, 5
    • depth analysis and high quality: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
    • depth analysis and high specificity: 1
    • error rate and high quality: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
    • error rate and high specificity: 1, 2, 3
    • error source and high quality: 1, 2, 3, 4
    • generation sequencing read and high quality: 1