Selected article for: "mutation rate and nucleotide substitution"

Author: Yu, Chenglong; Liang, Qian; Yin, Changchuan; He, Rong L.; Yau, Stephen S.-T.
Title: A Novel Construction of Genome Space with Biological Geometry
  • Document date: 2010_4_1
  • ID: 3c4dttrt_33
    Snippet: A novel construction of genome space with biological geometry [Vol. 17, apply our new genome space to the phylogenetic analysis of organisms. Most existing methods for phylogenetic inference using biological sequences can be divided into two groups. The algorithms in the first group utilize various distance measures 12 -15 which are based on different models of nucleotide substitution or amino acid replacement, and then transform the distance mat.....
    Document: A novel construction of genome space with biological geometry [Vol. 17, apply our new genome space to the phylogenetic analysis of organisms. Most existing methods for phylogenetic inference using biological sequences can be divided into two groups. The algorithms in the first group utilize various distance measures 12 -15 which are based on different models of nucleotide substitution or amino acid replacement, and then transform the distance matrix into a tree. In the second group of approaches, instead of building a tree, the tree that can best explain the observed sequences under the evolutionary assumption is found by evaluating of different topologies. This category includes parsimony 16 -18 and maximum likelihood methods. 19 -21 All these methods require a multiple alignment of the sequences and assume some sort of evolutionary model, which require human intervention. Thus, the results are usually controversial. However, our genome space does not need sequence alignment and any evolutionary model. It is totally automatically generated and avoids computation repetition. First, we consider the phylogeny of mammals. Mitochondrial DNA is not highly conserved and has a rapid mutation rate, thus it is very useful for studying the evolutionary relationships of organisms. 22 We extracted 35 complete mammalian mitochondrial genome sequences from the GenBank, each of which has length of more than 16 000 nucleotides. Moreover, they have double-stranded and circular structures. As mentioned in the previous section, because we have already known the gene content of both strands of these genomes, we just treat them as the single-stranded (by using the heavy strand) circular genomes. For this case, we treat every point as the start point in this circular sequence of length n, and then we get n linear single-strand genomes. For every linear single-strand genome sequence, by using the nucleotide vector system shown in Fig. 1 , we can compute its n-dimensional moment vector. Then, we take average by n for these n n-dimensional moment vectors to get a normalized moment vector (M 1 , M 2 , . . . , M n ). Here, we use the first 60 components of the moment vector (M 1 , M 2 , . . . , M 60 ) to characterize these 35 genome graphical curves and obtained 35 points in 60-dimensional genome space. By computing the Euclidean distances between these points, we got the distance matrix for these 35 organisms. The phylogenetic tree for them shown in Fig. 3 is generated using UPGMA program in the MEGA 4 package. 23 The last 10 mammals are grouped into a cluster because they are primates, and the phylogenetic relationship among them coincides with those found by Raina et al. 24 We also found that Norway rat, vole, and squirrel are grouped into a cluster for the reason that they are rodent species.

    Search related documents:
    Co phrase search for related documents