Selected article for: "pairwise phylogenetic distance matrix and phylogenetic distance"

Author: Gurjit S. Randhawa; Maximillian P.M. Soltysiak; Hadi El Roz; Camila P.E. de Souza; Kathleen A. Hill; Lila Kari
Title: Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
  • Document date: 2020_2_4
  • ID: cetdqgff_22
    Snippet: Sequences become very similar at lower taxonomic levels (sub-genera and species). sequences as Sarbecovirus. This suggests substantial similarity between COVID-19 and 364 the Sarbecovirus sequences. Test-5 and Test-6 (see Table 3 ) are designed to verify that 365 COVID-19 sequences can be differentiated from the known species in the 366 Betacoronavirus genus. MLDSP-GUI achieved a maximum classification score of 98.7% 367 for Test-5 and 100% for T.....
    Document: Sequences become very similar at lower taxonomic levels (sub-genera and species). sequences as Sarbecovirus. This suggests substantial similarity between COVID-19 and 364 the Sarbecovirus sequences. Test-5 and Test-6 (see Table 3 ) are designed to verify that 365 COVID-19 sequences can be differentiated from the known species in the 366 Betacoronavirus genus. MLDSP-GUI achieved a maximum classification score of 98.7% 367 for Test-5 and 100% for Test-6 using Subspace Discriminant classification model. This 368 shows that although COVID-19 and Sarbecovirus are closer on the basis of genomic 369 similarity (Test-4), they are still distinguishable from known species. Therefore, these 370 results suggest that COVID-19 may represent a genetically distinct species of 371 Sarbecovirus. All COVID-19 virues are visually seen in MoDMap3D generated from 372 Test-5 (see Figure 2 (b)) as a closely packed cluster and it supports a fact that there is 373 99% similarity among these sequences [12, 31] . The MoDMap3D generated from the 374 Test-5 (Figure 2(b) ) visually suggests and the average distances from COVID-19 375 sequences to all other sequences confirm that the COVID-19 sequences are most 376 proximal to the RaTG13 (distance: 0.0203), followed by the bat-SL-CoVZC45 (0.0418), 377 and bat-SL-CoVZX21 (0.0428). To confirm this proximity, a UPGMA phylogenetic tree 378 is computed from the PCC-based pairwise distance matrix of sequences in Test-6, see 379 Figure 3 . The phylogenetic tree placed the RaTG13 sequence closest to the COVID-19 380 sequences, followed by the bat-SL-CoVZC45 and bat-SL-CoVZX21 sequences. This 381 closer proximity represents the smaller genetic distances between these sequences and 382 aligns with the visual sequence relationships shown in the MoDMap3D of Figure 2 (b). 383 We further confirm our results regarding the closeness of COVID dinucleotides in their genomes, otherwise known as CG suppression [81, 82] . This feature 403 is thought to have been due to the accumulation of spontaneous deamination mutations 404 of methyl-cytosines over time [81] . As viruses are obligate parasites, evolution of viral 405 genomes is intimately tied to the biology of their hosts [83] . As host cells develop 406 strategies such as RNA interference and restriction-modification systems to prevent and 407 limit viral infections, viruses will continue to counteract these strategies [82] [83] [84] .

    Search related documents:
    Co phrase search for related documents
    • average distance and genetic distance: 1, 2, 3, 4, 5
    • average distance and host cell: 1
    • classification model and close proximity: 1
    • classification model and host cell: 1, 2
    • close proximity and host biology: 1
    • close proximity and host cell: 1, 2, 3, 4, 5, 6, 7
    • genetic distance and host cell: 1, 2, 3
    • host cell and limit viral infection: 1, 2, 3
    • host cell and obligate parasite: 1, 2