Author: Alejandro A Schäffer; Eneida Hatcher; Linda Yankie; Lara Shonkwiler; J Rodney Brister; Ilene Karsch-Mizrachi; Eric P Nawrocki
Title: VADR: validation and annotation of virus sequence submissions to GenBank Document date: 2019_11_22
ID: besvz92f_43
Snippet: We ran the VADR v1.0 v-annotate.pl script with default parameters on the four sequence datasets. The numbers of sequences that pass and fail VADR from each of the four sets are shown in Table 4 . For norovirus, about 92% of partial sequences and about Table 5 lists each type of fatal VADR alert observed in one of the four datasets with counts of instances reported and sequences for which one or more instances was reported. The most common alert, .....
Document: We ran the VADR v1.0 v-annotate.pl script with default parameters on the four sequence datasets. The numbers of sequences that pass and fail VADR from each of the four sets are shown in Table 4 . For norovirus, about 92% of partial sequences and about Table 5 lists each type of fatal VADR alert observed in one of the four datasets with counts of instances reported and sequences for which one or more instances was reported. The most common alert, peptrans, occurs 6613 times in 1781 of the 59,127 sequences, approximately 3%. This alert does not indicate a unique problem itself, but rather is reported for mature peptides for which the parent CDS that is cleaved to form the mature peptide has a fatal alert, so it is redundant with at least one other alert. The next most common alert, noannotn, occurs for 2753 sequences, 2236 of which are in the DP dataset, indicating that no similar RefSeq was found for these sequences during the classification stage. Other alerts with more than 1000 instances include indf5pst and indf3pst which occur when the blastx protein-based alignment of a predicted CDS translation in the validation stage does not extend to with 5 nucleotides of the 5' or 3' ends of the nucleotide-based alignment. Thirty additional fatal alerts occurred for at least one sequence. Four fatal alert types did not occur for any sequence: unexdivg would be reported in the rare case that a sequence was recognized as similar to a RefSeq but too divergent to align within memory requirements; lowsimis would be reported if a dissimilar region occured outside all predicted features, which is unlikely for norovirus and dengue virus which have features nearly along their full length; indfstrp would be reported if blastx reported similarity to a CDS region on the negative strand, when nucleotide similarity is primarily recognized on the positive strand, and incsbgrp would be reported if a sequence is recognized as not belonging to a specified subgroup (e.g. Norovirus genotype), but was never reported in these tests because we did not specify subgroups. The incsbgrp alert was added to fit the design of the NCBI submission interface for norovirus in which submitters are asked to specify the genogroup, This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Search related documents:
Co phrase search for related documents- CC0 license and classification stage: 1, 2
- CC0 license and dengue virus: 1, 2
- CC0 license and DP dataset: 1
- CDS translation and dengue virus: 1
- dengue virus and DP dataset: 1, 2
Co phrase search for related documents, hyperlinks ordered by date