Author: Bahir, Iris; Fromer, Menachem; Prat, Yosef; Linial, Michal
Title: Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences Document date: 2009_10_13
ID: 629kl04a_9
Snippet: The huge diversity among viruses encompasses their mode of replication, shape, stability, proteome size, and infectivity. These factors impose an inherent difficulty in the classification of viruses into taxonomical groupings. Currently, B10% of all sequences in the UniProtKB database (Boutet et al, 2007) (release 14.6) are viral proteins (718 000 proteins). Actually, full-length proteins account for only a third of these, and, following the elim.....
Document: The huge diversity among viruses encompasses their mode of replication, shape, stability, proteome size, and infectivity. These factors impose an inherent difficulty in the classification of viruses into taxonomical groupings. Currently, B10% of all sequences in the UniProtKB database (Boutet et al, 2007) (release 14.6) are viral proteins (718 000 proteins). Actually, full-length proteins account for only a third of these, and, following the elimination of sequence redundancy (at the level of 90% identity), the number of proteins is reduced to only B10% of the original number (72 992 proteins) (Figure 1 ). In addition, the low fraction of these proteins that are manually reviewed (based on the SwissProt database) results in only 1% of the initial collection (7416 proteins). Furthermore, the relevance of specific virus families to human health has led to a strong bias in the quality and reliability of genome annotation. The majority of viral sequences in the public databases are derived from only a few viral families, whereas most families remain poorly represented. This point is illustrated for the HIV, which makes up 36% of all viral protein entries ( Figure 1 ). Half of all viral proteins are either from the HIV or hepatitis (Hepadnaviridae) viruses, two families with an indisputable impact on human health. An additional source of bias in analyzing the viral world stems from data that originate from incomplete genomes. The UniProtKB annotation of 'complete proteome' covers only 0.5% of all viral sequences.
Search related documents:
Co phrase search for related documents- genome annotation and human health: 1
- genome annotation and incomplete genome: 1, 2
- huge diversity and human health: 1, 2
Co phrase search for related documents, hyperlinks ordered by date