Author: Huang, Yi; Lau, Susanna K. P.; Woo, Patrick C. Y.; Yuen, Kwok-yung
Title: CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes Document date: 2007_10_2
ID: ujhgb3b0_3
Snippet: By July 2007, more than 3000 coronavirus sequence records, including a total of 264 complete genomes, are available in GenBank (24) . Among the 25 coronavirus species with complete genome sequence available, six were sequenced by our group, including CoV-HKU1 and bat SARS-CoV (13, 16, 18, 19) . Furthermore, we defined two novel subgroups of group 2 coronavirus (18) . During the process of batch sequence retrieval for comparative genome analysis o.....
Document: By July 2007, more than 3000 coronavirus sequence records, including a total of 264 complete genomes, are available in GenBank (24) . Among the 25 coronavirus species with complete genome sequence available, six were sequenced by our group, including CoV-HKU1 and bat SARS-CoV (13, 16, 18, 19) . Furthermore, we defined two novel subgroups of group 2 coronavirus (18) . During the process of batch sequence retrieval for comparative genome analysis of the coronavirus genomes that we sequenced, we encountered several major problems about the coronavirus sequences in GenBank as well as other coronavirus databases (Coronaviridae Bioinformatics Resource, http://athena.bioc.uvic.ca/database.php?db= coronaviridae; PATRIC http://patric.vbi.vt.edu) (25) . First, in GenBank, the non-structural proteins in the polyprotein encoded by orf1ab were not annotated. Second, in all databases, for the non-structural proteins encoded by ORFs downstream to orf1ab, the annotations are often confusing because they are not annotated using a standardized system. Third, multiple accession numbers are often present for reference sequences (26) . These problems often lead to confusion when sequence retrieval is performed. Fourth, coronaviruses, especially SARS-CoV, amplified from different specimens may contain the same genome or gene sequences. These sequences usually lead to redundant work when they are analyzed.
Search related documents:
Co phrase search for related documents- non structural protein and orf1ab encode: 1
- non structural protein and polyprotein non structural protein: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- non structural protein and reference sequence: 1
- non structural protein and sequence coronavirus genome: 1, 2
- redundant work and reference sequence: 1
- reference sequence and sequence coronavirus genome: 1, 2, 3
Co phrase search for related documents, hyperlinks ordered by date