Selected article for: "long assembly and low complexity"

Author: Rose, Rebecca; Constantinides, Bede; Tapinos, Avraam; Robertson, David L; Prosperi, Mattia
Title: Challenges in the analysis of viral metagenomes
  • Document date: 2016_8_3
  • ID: x3u9i1vq_15_0
    Snippet: Modern de novo assemblers generally leverage either de Bruijn graphs or read overlap graphs as part of the approach known as overlap layout consensus (OLC). Figure 1 illustrates the differences between the two methods. OLC assemblers use the similarity of whole reads in order to construct a graph wherein each read is represented by a node, and subsequently merge overlapping reads into consensus contigs (Deng et al. 2015) . OLC is relatively time .....
    Document: Modern de novo assemblers generally leverage either de Bruijn graphs or read overlap graphs as part of the approach known as overlap layout consensus (OLC). Figure 1 illustrates the differences between the two methods. OLC assemblers use the similarity of whole reads in order to construct a graph wherein each read is represented by a node, and subsequently merge overlapping reads into consensus contigs (Deng et al. 2015) . OLC is relatively time and memory intensive, scaling poorly to millions of reads and beyond. However, the fewer, longer reads generated by emerging single molecule sequencing technologies tend to be well suited to OLC assembly, which can be easily implemented to tolerate long and noisy sequences (Compeau, Pevzner, and Tesler 2011) . Older, notable, de novo assemblers implementing OLC include CAP3 (Huang and Madan 1999) and Celera (http://www.jcvi.org/cms/research/projects/cabog/over view/), while MHAP (Berlin et al. 2015) , Canu (Berlin et al. 2015) , and Miniasm (Li 2016) represent the current state of the art. There also exist a number of OLC assemblers intended for use with viral sequences: VICUNA was designed for short, nonrepetitive and highly variable reads from a single population (Yang et al. 2012) , and PRICE (Ruby, Bellare, and Derisi, 2013) iteratively assembles low to moderate complexity metagenomes (e.g. Runckel et al. 2011; Grard et al. 2012 ;) using a similar algorithm to the actively developed consensus assembler IVA (Hunt et al. 2015) , which like VICUNA is designed for single virus populations rather than metagenomes (see Table 1 for additional details on programs). A de Bruijn or k-mer graph represents a set of reads in terms of its k-mer composition, where k-mers are subsequences of a length k, specified by the user. Each k-mer is assigned to an edge in a graph, where the nodes are k-1 prefixes and suffixes of the k-mer. The assembler identifies the path through the graph in which each edge is visited only once (reviewed in Compeau, Pevzner, and Tesler 2011) . De Bruijn graphs are much more efficient to construct than overlap graphs and are suited to large numbers of short reads, and where coverage is high, since redundant k-mers occupy negligible random access memory (RAM). However, with this efficiency comes a lack of error tolerance in identifying overlaps, less tolerance of repeated sequences in comparison to overlap graphs, and a loss of read coherence, meaning that k-mers originating from different reads may be co-assembled. Examples of assemblers using de Bruijn graphs include SOAPdenovo (Luo et al. 2012 ), ALLPATHS Figure 1 . Two widely used methodologies in de novo assembly of short reads. Reads are not represented explicitly within a de Bruijn graph; they are instead decomposed into distinct subsequence 'words' of length k, or k-mers, which can be linked together via overlapping k-mers to create an assembly graph. In OLC, a pairwise comparison of all reads is performed, identifying reads with overlapping regions. These overlaps are used to construct a read graph. Next, overlapping reads are bundled into aligned contigs in what is referred to as the layout step, before finally the most likely nucleotide at position is determined through consensus. This figure is simplified to demonstrate the theory for the assembly of single genomes; note that the process has additional complexities for the reconstruction of metagenomes. (Butler et al. 2008) , SPAdes (Bankevich et al. 2012) , and ABySS (Si

    Search related documents:
    Co phrase search for related documents
    • art current state represent and current state: 1
    • assembly graph and De Bruijn graph: 1, 2, 3, 4
    • assembly graph and de Bruijn graph assembler: 1, 2