NIST Coronavirus Research Data DocResults - alignment read and method sequence

Selected article for: "alignment read and method sequence"

Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard

Title: Interpretable detection of novel human viruses from genome sequencing data

Document date: 2020_1_30

ID: ac00tai9_57

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relati.....

KG: Link to Knowledge Graph

Complete Snippet

Document: Compared to the previous state-of-the-art in viral host prediction directly from next-generation sequencing reads (Zhang et al., 2019) , our models drastically reduce the error rates. This holds also for novel viruses not present in the training set. In the paired read scenario, the previously described method fails, and standard, alignment-based homology testing algorithm cannot find any matches in more than 10% of the cases, resulting in relatively low accuracy. On a real human virome sample, where a main source of negative (Moustafa et al., 2017) , our method filters out non-human viruses with high specificity. In this scenario, the BLAST-derived groundtruth labels were mined using the complete database (as opposed to just a training set). In all cases, our results are only as good as the training data used; high quality labels and sequences are needed to develop trustworthy models. Ideally, sources of error should be investigated with an in-depth analysis of a model's performance on multiple genomes covering a wide selection of taxonomic units. This is especially important as the method assumes no mechanistic link between an input sequence and the phenotype of interest, and the input sequence constitutes only a small fraction of the target genome without a wider biological context. Still, it is possible to predict a label even from those small, local fragments. A similar effect was also observed for image classification with CNNs (Brendel & Bethge, 2019) .

Search related documents:

Co phrase search for related documents

art previous state and image classification: 1, 2, 3
art previous state and novel virus: 1
art previous state and previous state: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
art previous state and training set: 1
biological context and complete database: 1
biological context and high quality: 1
biological context and previous state: 1
biological context and sequencing read: 1
biological context and target genome: 1
biological context and training set: 1

Co phrase search for related documents, hyperlinks ordered by date

ABSTRACT:

TERMS:

DOCUMENTS: