Selected article for: "careful interpretation and high rate"

Author: Christina J. Castro; Rachel L. Marine; Edward Ramos; Terry Fei Fan Ng
Title: The effect of variant interference on de novo assembly for viral deep sequencing
  • Document date: 2019_10_22
  • ID: d5ghy39g_1
    Snippet: Viruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-24 generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how 25 variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated 26 experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of 27 contigs. This "variant .....
    Document: Viruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-24 generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how 25 variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated 26 experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of 27 contigs. This "variant interference" (VI) is highly consistent and reproducible by ten most used de novo 28 assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is 29 pairwise identities between viral variants. These findings were further supported by in silico simulations, 30 where selective removal of minor variant reads from clinical datasets allow the "rescue" of full viral genomes 31 from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de 32 novo assembly in viral deep sequencing. Genomic surveillance of viruses is particularly important in light of their rapid rate of evolution. Viruses 44 have higher mutation rates than cellular-based taxa, with RNA viruses having mutation rates as high as 1.5 × 45 10 −3 mutations per nucleotide, per genomic replication cycle. 4 Due to this high mutation rate, it is well 46 established that most RNA viruses exist as a swarm of quasispecies, 5 with each quasispecies containing unique 47 single nucleotide polymorphisms (SNPs). The presence of these variants plays a key role in viral adaptation. 48 49 Due to viruses' rapid evolution, a single clinical sample often contains a mixture of many closely related 50 viruses. Viral quasispecies are mainly derived from intra-host evolution, with RNA viruses such as poliovirus, 51 human immunodeficiency virus (HIV), hepatitis C (HCV), influenza, dengue, and West Nile viruses maintaining 52 diverse quasispecies populations within a host. 6, 7, 8, 9, 10, 11, 12, 13 Conversely, the term "viral strains" often refers 53 to different lineages of viruses found in separate hosts, or a co-infection of viruses in the same host due to 54 multiple infection events. As a result, sequence divergence is usually higher when comparing viral strains 55 compared to quasispecies. In this study, we use the term "variant" to encompass both quasispecies and 56 strains regardless of how the variants originated in the biological samples. 57 58 Since many sequencing technologies produce reads that are significantly shorter than the target 59 genome size, a process to construct contigs, scaffolds, and full-length genomes is needed. Reference-mapping 60 and de novo assembly are the two primary bioinformatic strategies for genome assembly. Reference-mapping 61 requires a closely-related genome as input to align reads, while de novo assembly generates contigs without 62

    Search related documents:
    Co phrase search for related documents
    • biological sample and deep sequencing: 1
    • careful interpretation and deep sequencing: 1, 2
    • clinical sample and co virus infection: 1
    • clinical sample and deep sequencing: 1, 2