Selected article for: "average number and observed pattern"

Author: Christina J. Castro; Rachel L. Marine; Edward Ramos; Terry Fei Fan Ng
Title: The effect of variant interference on de novo assembly for viral deep sequencing
  • Document date: 2019_10_22
  • ID: d5ghy39g_6_0
    Snippet: The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. . https://doi.org/10.1101/815480 doi: bioRxiv preprint Effect of variant assembly using popular de novo assemblers 100 101 After establishing the growing use of NGS technologies for viral sequencing, we next focused on 102 understanding how the presence of viral variants may influence de novo assembly output. We generated 247 103 simulated viral NGS dataset.....
    Document: The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. . https://doi.org/10.1101/815480 doi: bioRxiv preprint Effect of variant assembly using popular de novo assemblers 100 101 After establishing the growing use of NGS technologies for viral sequencing, we next focused on 102 understanding how the presence of viral variants may influence de novo assembly output. We generated 247 103 simulated viral NGS datasets representing a continuum of pairwise identity (PID) between two viral variants, 104 from 75% PID (one nucleotide difference every 4 nucleotides), to 99.6% PID (one nucleotide difference every 105 250 nucleotides) [ Figure 2 ]. For Experiment 1, these datasets were assembled using 10 of the most used de 106 novo assembly programs [ Figure 2 and Supplement Figure S1a ] to evaluate their ability to assemble the two 107 variants into their own respective contigs as the PID between the variants increases. One key observation is that the assembly result can change from two (correct) contigs to many 110 (unresolvable) contigs simply by having variant reads; the presence of viral variants affected the contig 111 assembly output of all 10 assemblers tested. The output of the SPAdes, MetaSPAdes, ABySS, Cap3, and IDBA 112 assemblers shared a few commonalities, demonstrated by a conceptual model in Figure 3A . First, below a 113 certain PID, when viral variants have enough distinct nucleotides to resolve the two variant contigs, the de 114 novo assemblers produced two contigs correctly [ Figure 3 ]. We refer to this as "variant distinction" (VD), with 115 the highest pairwise identity where this occurs as the VD threshold. Above this threshold, the assemblers 116 produced tens to thousands of contigs [ Figure 3 ], a phenomenon we define as "variant interference" (VI). As 117 PID between the variants continue to increase, the de novo assemblers can no longer distinguish between the 118 variants and assembled all the reads into a single contig, a phenomenon we define as "variant singularity" 119 (VS). [ Figure 3 ]. The lowest pairwise identity where a single contig is assembled is the VS threshold. 120 121 Slight differences in the variant interference patterns (relative to the canonical variant interference 122 model) were observed for the 10 assemblers investigated. VD was observed for SPAdes, MetaSPAdes, and 123 ABySS assemblers. While it was not observed with Cap3 and IDBA with the current simulated data parameters, 124 we speculate that VD may occur at a lower PID level for these assemblers than tested in this study. The PID 125 range where VI was observed was distinct for each de novo assembler [ Figure 3 ]. During VI, SPAdes produced 126 as many as 134 contigs and ABySS produced 3,076 contigs, while MetaSPAdes, Cap3, and IDBA produced up to 127 10. 128 7 129 A different pattern was observed for Mira, Trinity, and SOAPdenovo2 assemblers. The average number 130 of contigs generated by Mira, Trinity, and SOAPdenovo2 was 5, 36, and 283, respectively across all variant PIDs 131 from 75%-99.96%. Specifically, Mira and Trinity generated fewer contigs at low PID, but produced many 132 contigs when the two variants reach 97.1% PID and 96.0% PID, respectively. For SOAPdenovo2, a larger 133 number of contigs were produced regardless of the PID. This indicates that these assemblers generally have 134 major challenges producing a single genome; this has been observed in previous studies comparing assembly 135 per

    Search related documents:
    Co phrase search for related documents
    • assembly output and novo assembly output: 1