Selected article for: "negative class and training set"

Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data
  • Document date: 2020_1_30
  • ID: ac00tai9_9
    Snippet: In this paper, we first improve the performance of read-based predictions of the viral host (human or non-human) from next-generation sequencing reads. We show that reversecomplement (RC) neural networks (Bartoszewicz et al., 2019) significantly outperform both the previous state-ofthe-art (Zhang et al., 2019) and the traditional, alignmentbased algorithm -BLAST (Altschul et al., 1990) , which constitutes a gold standard in homology-based bioinfo.....
    Document: In this paper, we first improve the performance of read-based predictions of the viral host (human or non-human) from next-generation sequencing reads. We show that reversecomplement (RC) neural networks (Bartoszewicz et al., 2019) significantly outperform both the previous state-ofthe-art (Zhang et al., 2019) and the traditional, alignmentbased algorithm -BLAST (Altschul et al., 1990) , which constitutes a gold standard in homology-based bioinformatics analyses. We show that defining the negative (nonhuman) class is non-trivial and compare different ways of constructing the training set. Strikingly, a model trained to distinguish between viruses infecting humans and viruses infecting other chordates (a phylum of animals including vertebrates) generalizes well to evolutionarily distant nonhuman hosts, including even bacteria. This suggests that the host-related signal is strong and the learned decision boundary separates human viruses from other DNA sequences surprisingly well.

    Search related documents:
    Co phrase search for related documents