Selected article for: "ab initio gene prediction and gene prediction"

Author: Wang, Shiliang; Sundaram, Jaideep P; Spiro, David
Title: VIGOR, an annotation program for small viral genomes
  • Document date: 2010_9_7
  • ID: 0lbxvudt_2
    Snippet: Two major approaches, ab initio gene finding and similarity-based prediction [1] , have been commonly applied to gene prediction. The ab initio method, also known as the intrinsic statistical strategy, computes statistical data such as the nucleotide frequencies and their ordering in a set of genomic sequences that have been characterized. This is because the nucleotide frequencies and ordering for each genome usually differ between protein codin.....
    Document: Two major approaches, ab initio gene finding and similarity-based prediction [1] , have been commonly applied to gene prediction. The ab initio method, also known as the intrinsic statistical strategy, computes statistical data such as the nucleotide frequencies and their ordering in a set of genomic sequences that have been characterized. This is because the nucleotide frequencies and ordering for each genome usually differ between protein coding and non-coding regions. However, viral genomes, because of their small genome sizes, may not provide sufficient training data to derive the parameters necessary to attain the best performance possible for this approach. The heuristic method, which determines the parameters of the necessary models from short sequences, was adopted by several gene prediction programs, e.g., GeneMarkS [2] . Small amount of genomic sequence, but long enough to produce the efficient Markov models, usually a small fraction of large genome or small genomes like viral genomes, is needed for this method. The linear function reflecting the relationship between the nucleotide frequencies in the three codon positions and the global nucleotide frequencies is obtained by analyzing the small amount of DNA sequence. These derived data will be used to predict protein coding genes by the heuristic method [3] . The ab initio method uses these trained or self-trained modules to select the protein coding regions and predict coding sequences. The similarity-based method predicts protein coding sequences by a different strategy, identifying gene coding sequences by sequence similarity alignment to reference sequences which are closely related evolutionarily. Since these two approaches use different strategies to detect the protein coding sequences, the performances are different and depend on the training data set and reference sequence data. Usually, ab initio approach is more sensitive than similarity-based approach, while the performance of similarity-based method has greater specificity. This is because ab initio methods predict some false positive exons and genes in intergenic regions and introns, while similaritybased tools cannot detect genes if the homologous sequences are not included in the reference data.

    Search related documents: