Selected article for: "correlation coefficient and rank correlation coefficient"

Author: Gurjit S. Randhawa; Maximillian P.M. Soltysiak; Hadi El Roz; Camila P.E. de Souza; Kathleen A. Hill; Lila Kari
Title: Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
  • Document date: 2020_2_4
  • ID: cetdqgff_16
    Snippet: MLDSP-GUI with CGR at k = 7 as the numerical representation was used for the 193 classification of the dataset in Test-3a. The maximum classification accuracy of 98.1% is 194 obtained using the Linear Discriminant model and the respective MoDMap3D is shown 195 in Figure 1 (c). All six classification models trained on 208 sequences were used to 196 classify (predict the label of) the 29 COVID-19 sequences. All of our machine 197 learning-based mod.....
    Document: MLDSP-GUI with CGR at k = 7 as the numerical representation was used for the 193 classification of the dataset in Test-3a. The maximum classification accuracy of 98.1% is 194 obtained using the Linear Discriminant model and the respective MoDMap3D is shown 195 in Figure 1 (c). All six classification models trained on 208 sequences were used to 196 classify (predict the label of) the 29 COVID-19 sequences. All of our machine 197 learning-based models predicted the label as Betacoronavirus for all 29 sequences (Table 198 2). To verify that the correct prediction is not an artifact of possible bias because of 199 larger Betacoronavirus cluster, we did a secondary Test-3b with cluster size limited to 200 the size of smallest cluster (after removing the Gammacoronavirus because it just had 9 201 sequences). The maximum classification accuracy of 100% is obtained using the Linear 202 Discriminant model for Test-3b. All six classification models trained on 60 sequences 203 were used to classify the 29 COVID-19 sequences. All of our machine learning-based 204 models predicted the label as Betacoronavirus for all 29 sequences (Table 2) . This Table 3 . The maximum classification accuracy of 98.7% with CGR at k = 7 218 as the numerical representation is obtained using the Subspace Discriminant model. The 219 respective MoDMap3D is shown in Figure 2 (b). In the MoDMap3D plot from Test-5, 220 COVID-19 sequences are placed in a single distinct cluster, see Figure 2 (b). As visually 221 suggested by the MoDMap3D (Figure 2 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.03.932350 doi: bioRxiv preprint Betacoronavirus at k = 7 (Table 4) , which is consistent with the ML-DSP results in 250 Test-3 ( Table 2 ). The COVID-19 was then compared to all sub-genera within the 251 Betacoronavirus genus: Embecovirus, Merbecovirus, Nobecovirs and Sarbecovirus seen in 252 Figure 6 . The Spearman's rank test was again consistent with the ML-DSP results seen 253 in Table 3 , as the k-mer frequencies at k = 7 showed the highest correlation to the 254 sub-genus Sarbecovirus ( for the correlation test. By visually inspecting each hexbin scatterplot, the degree of correlation is displayed by the variation in spread between the points. Hexagonal points that are closer together and less dispersed as seen in (d) are more strongly correlated and have less deviation. Table 4 . Spearman's rank correlation coefficient (ρ) values from Figure 5 and 6 , for which all p-values < 10 −5 . The strongest correlation value was found between Betacoronavirus and Sarbecovirus when using the data sets from Test 3a from Table 2 and Test 4 from Table 3 highlighting the need for continued intervention [34, [61] [62] [63] . Still, the early COVID-19 262 genomes that have been sequenced and uploaded are over 99% similar, suggesting these 263 infections result from a recent cross-species event [12, 31, 41] .

    Search related documents:
    Co phrase search for related documents
    • Betacoronavirus cluster and classification model: 1
    • Betacoronavirus cluster and cluster size: 1
    • Betacoronavirus cluster and Discriminant model: 1
    • Betacoronavirus cluster and distinct cluster: 1
    • classification accuracy and cluster size: 1
    • classification accuracy and correct prediction: 1
    • classification accuracy and correlation coefficient: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
    • classification accuracy and correlation degree: 1
    • classification accuracy and correlation value: 1
    • classification accuracy and data set: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
    • classification accuracy and Discriminant model: 1
    • classification model and cluster size: 1, 2
    • classification model and correct prediction: 1
    • classification model and correlation coefficient: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
    • classification model and correlation value: 1, 2, 3
    • classification model and data set: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
    • classification model and Discriminant model: 1, 2, 3, 4, 5, 6, 7
    • cluster size and correlation coefficient: 1
    • cluster size and data set: 1, 2, 3