Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data Document date: 2020_1_30
ID: ac00tai9_9
Snippet: In this paper, we first improve the performance of read-based predictions of the viral host (human or non-human) from next-generation sequencing reads. We show that reversecomplement (RC) neural networks (Bartoszewicz et al., 2019) significantly outperform both the previous state-ofthe-art (Zhang et al., 2019) and the traditional, alignmentbased algorithm -BLAST (Altschul et al., 1990) , which constitutes a gold standard in homology-based bioinfo.....
Document: In this paper, we first improve the performance of read-based predictions of the viral host (human or non-human) from next-generation sequencing reads. We show that reversecomplement (RC) neural networks (Bartoszewicz et al., 2019) significantly outperform both the previous state-ofthe-art (Zhang et al., 2019) and the traditional, alignmentbased algorithm -BLAST (Altschul et al., 1990) , which constitutes a gold standard in homology-based bioinformatics analyses. We show that defining the negative (nonhuman) class is non-trivial and compare different ways of constructing the training set. Strikingly, a model trained to distinguish between viruses infecting humans and viruses infecting other chordates (a phylum of animals including vertebrates) generalizes well to evolutionarily distant nonhuman hosts, including even bacteria. This suggests that the host-related signal is strong and the learned decision boundary separates human viruses from other DNA sequences surprisingly well.
Search related documents:
Co phrase search for related documents- bacteria include and gold standard: 1
- bacteria include and human virus: 1, 2, 3, 4, 5, 6
- bioinformatic analysis and DNA sequence: 1, 2
- bioinformatic analysis and generation sequencing: 1, 2, 3, 4, 5, 6, 7, 8, 9
- bioinformatic analysis and human virus: 1, 2, 3, 4, 5, 6, 7, 8
- DNA sequence and generation sequencing: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
- DNA sequence and human virus: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
- generation sequencing and gold standard: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
- generation sequencing and human virus: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- generation sequencing read and gold standard: 1
- gold standard and human virus: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- host relate and human virus: 1, 2
Co phrase search for related documents, hyperlinks ordered by date