Selected article for: "NCBI database and reference genome"

Author: Schlub, Timothy E; Buchmann, Jan P; Holmes, Edward C
Title: A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences
  • Document date: 2018_8_7
  • ID: yiqdsf9z_15
    Snippet: We next screened for previously undiscovered overlapping genes by using the combined test and a P value cut-off of 0.001. This cut-off was chosen as only 9.7% of any discoveries are estimated to be a false positive (table 1) . We find evidence for 40 undocumented functional overlapping ORFs within all reference genomes of linear RNA viruses. Of these 40 ORFs, two had been previously described in Synplot2's RNA screening in 2014 (Firth 2014 MBE no.....
    Document: We next screened for previously undiscovered overlapping genes by using the combined test and a P value cut-off of 0.001. This cut-off was chosen as only 9.7% of any discoveries are estimated to be a false positive (table 1) . We find evidence for 40 undocumented functional overlapping ORFs within all reference genomes of linear RNA viruses. Of these 40 ORFs, two had been previously described in Synplot2's RNA screening in 2014 (Firth 2014 MBE not annotated within GenBank, they were not necessarily undiscovered, as some existed within the NCBI protein databases. To remove these already discovered or hypothesized overlapping ORFs, we performed a protein BLAST search of the 38 undocumented overlapping ORFs and found that nine had previously been discovered but were not annotated within the reference genome, thereby leaving 29 newly discovered functional overlapping ORFs from our method (table 3, supplementary materials S2 and S5, Supplementary Material online). Of these newly discovered ORFs, we would expect approximately three to be false discoveries. To test if we can detect homologs of the 29 newly discovered overlaps in other species, we aligned their protein sequence against the NCBI nt database using tblastn (supplementary material S4, Supplementary Material online). We filtered the results to only include alignments with a similarity of at least 90% and where the alignment was at least 90% the length of the ORF (Material Table 1 . Sensitivity, false discovery, and area under the curve for each test across a range of P value cut-offs and overlapping lengths. The 29 discovered ORFs ranged from 87 to 708 codons in length, with a median and interquartile range of 195.5 (157-279.2) codons; 13 were transcribed in the same direction (sense, frames þ1 and þ2) as the original gene with 17 coded in the opposite direction on complementary nucleotides (antisense frames Àc0, Àc1, and Àc2, supplementary material S1, Supplementary Material online). In addition, 18 of the ORFs were located completely within their reference coding region, eight lay on the boundary and four encompassed the entire coding region, suggesting that the reference coding region may lie completely within the larger discovered ORF. Of these discovered ORFs, a number are of particular interest and discussed in more detail below.

    Search related documents:
    Co phrase search for related documents
    • blast search and cut off: 1
    • code region and coding region: 1, 2
    • coding region and complementary nucleotide: 1
    • coding region and cut off: 1, 2
    • coding region and entire coding region: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
    • cut off and false discovery: 1, 2, 3, 4, 5