Selected article for: "generation sequencing and new virus"

Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data
  • Document date: 2020_1_30
  • ID: ac00tai9_4
    Snippet: Several computational, genome-based methods exist that allow to predict the host-range of a bacteriophage (a bacteriainfecting virus). A selection of composition-based and alignment-based approaches has been presented in an extensive review by Edwards et al. (2016) . Prediction of eukariotic host tropism (including humans) based on known protein sequences was shown for the influenza A virus (Eng et al., 2014) . Two recent studies employ k-mer bas.....
    Document: Several computational, genome-based methods exist that allow to predict the host-range of a bacteriophage (a bacteriainfecting virus). A selection of composition-based and alignment-based approaches has been presented in an extensive review by Edwards et al. (2016) . Prediction of eukariotic host tropism (including humans) based on known protein sequences was shown for the influenza A virus (Eng et al., 2014) . Two recent studies employ k-mer based, k-NN classifiers and deep learning (Mock et al., 2019) to predict host range for a small set of three well-studied species directly from viral sequences. While those approaches are limited to those particular species and do not scale to viral host-range prediction in general, the Host Taxon Predictor (HTP) (Gałan et al., 2019) uses logistic regression and support vector machines to predict if a novel virus infects bacteria, plants, vertebrates or arthropods. Yet, the authors argue that it is not possible to use HTP in a read-based manner; it requires long sequences of at least 3,000 nucleotides. This is incompatible with modern metagenomic next-generation sequencing workflows, where the DNA reads obtained are at least 10-20 times shorter. Another study used gradient boosting machines to predict reservoir hosts and transmission via arthropod vectors for known human-infecting viruses (Babayan et al., 2018) . Zhang et al. (2019) designed several classifiers explicitly predicting whether a new virus can potentially infect humans. Their best model, a k-NN classifier, uses k-mer frequencies as features representing the query sequence and can yield predictions for sequences as short as 500 base pairs (bp). It worked also with 150bp-long reads from real DNA sequencing runs, although in this case the reads originated also from the viruses present in the training set (and were therefore not "novel").

    Search related documents:
    Co phrase search for related documents
    • alignment base and deep learning: 1
    • alignment base and DNA read: 1
    • base pair and bp base pair: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
    • base pair and composition base: 1, 2, 3, 4, 5, 6
    • base pair and deep learning: 1, 2, 3, 4
    • boost machine and deep learning: 1, 2, 3
    • bp base pair and composition base: 1
    • classifier design and deep learning: 1