Selected article for: "bat sarbecovirus and genome sequence"

Author: Gurjit S. Randhawa; Maximillian P.M. Soltysiak; Hadi El Roz; Camila P.E. de Souza; Kathleen A. Hill; Lila Kari
Title: Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
  • Document date: 2020_2_4
  • ID: cetdqgff_1
    Snippet: Sarbecovirus strains, the hypothesis that COVID-19 originated from bats is deemed very 56 likely [12, 34, 36, 39, [42] [43] [44] [45] . 57 All analyses performed thus far have been alignment-based and rely on the 58 annotations of the viral genes. Though alignment-based methods have been successful in 59 finding sequence similarities, their application can be challenging in many cases [46, 47] . 60 It is realistically impossible to analyze thousa.....
    Document: Sarbecovirus strains, the hypothesis that COVID-19 originated from bats is deemed very 56 likely [12, 34, 36, 39, [42] [43] [44] [45] . 57 All analyses performed thus far have been alignment-based and rely on the 58 annotations of the viral genes. Though alignment-based methods have been successful in 59 finding sequence similarities, their application can be challenging in many cases [46, 47] . 60 It is realistically impossible to analyze thousands of complete genomes using 61 alignment-based methods due to the heavy computation time. Moreover, the alignment 62 demands the sequences to be continuously homologous which is not always the case. 63 Alignment-free methods [48] [49] [50] [51] [52] have been proposed in the past as an alternative to 64 address the limitations of the alignment-based methods. Comparative genomics beyond 65 alignment-based approaches have benefited from the computational power of machine 66 learning. Machine learning-based alignment-free methods have also been used 67 successfully for a variety of problems including virus classification [50] [51] [52] . An 68 alignment-free approach [50] was proposed for subtype classification of HIV-1 genomes 69 and achieved ∼ 97% classification accuracy. MLDSP [51] , with the use of a broad range 70 of 1D numerical representations of DNA sequences, has also achieved very high levels of 71 classification accuracy with viruses. Even rapidly evolving, plastic genomes of viruses 72 such as Influenza and Dengue are classified down to the level of strain and subtype, 73 respectively with 100% classification accuracy. MLDSP-GUI [52] provides an option to 74 use 2D Chaos Game Representation (CGR) [53] as numerical representation of DNA 75 sequences. CGR's have a longstanding use in species classification with identification of 76 biases in sequence composition [49, 52, 53] . MLDSP-GUI has shown 100% classification 77 accuracy for Flavivirus genus to species classification using 2D CGR as numerical 78 representation [52] . MLDSP and MLDSP-GUI have demonstrated the ability to identify 79 the genomic signatures (a species-specific pattern known to be pervasive throughout the 80 genome) with species level accuracy that can be used for sequence (dis)similarity 81 analyses. In this study, we use MLDSP [51] and MLDSP-GUI [52] with CGR as a 82 numerical representation of DNA sequences to assess the classification of COVID-19 83 from the perspective of machine learning-based alignment-free whole genome 84 comparison of genomic signatures. Using MLDSP and MLDSP-GUI, we confirm that 85 the COVID-19 belongs to the Betacoronavirus, while its genomic similarity to the 86 sub-genus Sarbecovirus supports a possible bat origin. 87 This paper shows how machine learning using intrinsic genomic signatures can 88 provide rapid alignment-free taxonomic classification of novel pathogens. Our method 89 delivers accurate classifications of COVID-19 without a priori biological knowledge, by 90 a simultaneous processing of the geometric space of all relevant viral genomes. The 91 main contributions are: 92 • Identifying intrinsic viral genomic signatures, and utilizing them for a real-time 93 and highly accurate machine learning-based classification of novel pathogen 94 sequences, such as COVID-19;

    Search related documents:
    Co phrase search for related documents
    • accurate classification and alignment free method: 1
    • accurate classification and bat origin: 1, 2
    • accurate classification and biological knowledge: 1, 2
    • accurate classification and classification accuracy: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
    • accurate classification and DNA sequence: 1, 2, 3, 4
    • alignment base and DNA sequence: 1
    • alignment free approach and bat origin: 1, 2
    • alignment free approach and biological knowledge: 1, 2, 3
    • alignment free approach and classification accuracy: 1, 2, 3
    • alignment free approach and DNA sequence: 1, 2
    • alignment free method and bat origin: 1
    • alignment free method and comparative genomic: 1
    • alignment free method and complete genome: 1, 2
    • alignment free method and DNA sequence: 1, 2, 3, 4, 5, 6, 7
    • bat origin and biological knowledge: 1, 2
    • bat origin and broad range: 1, 2, 3
    • bat origin and classification accuracy: 1
    • bat origin and complete genome: 1, 2, 3, 4, 5
    • bat origin and DNA sequence: 1, 2, 3