Author: Yu, Chenglong; Liang, Qian; Yin, Changchuan; He, Rong L.; Yau, Stephen S.-T.
Title: A Novel Construction of Genome Space with Biological Geometry Document date: 2010_4_1
ID: 3c4dttrt_41
Snippet: In the study of mammalian mitochondrion, lentiviruses (including HIV), and coronavirus genomes, we used 60, 12, and 2 moments to construct the genome space, respectively. Here, we should emphasize that we do not need to calculate all the moments to determine the biological information of genomes. Remember that in the central limit theorem in probability and statistics, the limiting process is Gaussian. For Gaussian, the first two moments determin.....
Document: In the study of mammalian mitochondrion, lentiviruses (including HIV), and coronavirus genomes, we used 60, 12, and 2 moments to construct the genome space, respectively. Here, we should emphasize that we do not need to calculate all the moments to determine the biological information of genomes. Remember that in the central limit theorem in probability and statistics, the limiting process is Gaussian. For Gaussian, the first two moments determine the density function. Thus, we just use the first N moments to get the results, where N is much less than n (the length of genome). Thus, for coronavirus genomes, we only used the first two components of the moment vector (M 1 , M 2 ) because these two moments have allowed us to obtain the stable classified resultwhen higher moments are included the relationship of being close or farther away remains unchanged. To make this point clearer, we also use the first 20 components of the moment vector (M 1 , M 2 , . . . , M 20 ) to construct the 20-dimmensional genome space. By computing the Euclidean distances between these points in this genome space, we reconstructed the phylogenetic tree of these coronaviruses (Fig. 7) . Comparing Figs 6 and 7, we found that the classification relationship of these genomes are the same-group 1, group 2, group 3, group 4, group 5, and outgroups can still be seen in the two trees as six distinct clusters. This means that when using the higher moments, the relationship of being close or farther away remains unchanged. In other word, two moments are already enough to give the right classifying relationship for these genomes. For the same reason, we used the first 60 components of the moment vector (M 1 , M 2 , . . . , M 60 ) and the first 12 components of the moment vector (M 1 , M 2 , . . . , M 12 ) to generate the genome space for mammalian mitochondrion and lentiviruses genomes and obtained the stable phylogenetic analysis result.
Search related documents:
Co phrase search for related documents- analysis result and classification relationship: 1, 2
- analysis result and coronavirus genome: 1, 2
- biological information and classification relationship: 1
- biological information and genome biological information: 1, 2, 3
- coronavirus genome and distinct cluster: 1, 2, 3, 4
Co phrase search for related documents, hyperlinks ordered by date