Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data Document date: 2020_1_30
ID: ac00tai9_57
Snippet: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relati.....
Document: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relatively low accuracy. On a real human virome sample, where a main source of negative (Moustafa et al., 2017) , our method filters out non-human viruses with high specificity. In this scenario, the BLAST-derived groundtruth labels were mined using the complete database (as opposed to just a training set). In all cases, our results are only as good as the training data used; high quality labels and sequences are needed to develop trustworthy models. Ideally, sources of error should be investigated with an in-depth analysis of a model's performance on multiple genomes covering a wide selection of taxonomic units. This is especially important as the method assumes no mechanistic link between an input sequence and the phenotype of interest, and the input sequence constitutes only a small fraction of the target genome without a wider biological context. Still, it is possible to predict a label even from those small, local fragments. A similar effect was also observed for image classification with CNNs (Brendel & Bethge, 2019) .
Search related documents:
Co phrase search for related documents- generation sequencing read and sequencing read: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
- high quality and host prediction: 1
- high quality and image classification: 1, 2, 3, 4, 5, 6
- high quality and input sequence: 1, 2, 3
- high quality and main source: 1, 2, 3, 4, 5, 6, 7
- high quality and match find: 1
- high quality and method filter: 1, 2
- high quality and multiple genome: 1
- high quality and non human virus: 1
- high quality and novel virus: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
- high quality and previous state: 1
- high quality and previously describe: 1, 2
- high quality and relatively low accuracy: 1
- high quality and sequencing read: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
- high quality and similar effect: 1, 2
- high quality and small fraction: 1, 2, 3, 4, 5, 6
- previously describe and small fraction: 1
Co phrase search for related documents, hyperlinks ordered by date