Author: Rose, Rebecca; Constantinides, Bede; Tapinos, Avraam; Robertson, David L; Prosperi, Mattia
Title: Challenges in the analysis of viral metagenomes Document date: 2016_8_3
ID: x3u9i1vq_22
Snippet: Viral genomes and metagenomes comprising high intraspecific variation can be challenging targets for assembly, giving rise to complex assembly graphs and fragmented assemblies. This is often the case for clinical samples from HIV and Hepatitis C patients, in which high rates of mutation and long durations of infection can contribute to extreme population divergence, but can also be observed in environmental samples. Where such diversity exists, a.....
Document: Viral genomes and metagenomes comprising high intraspecific variation can be challenging targets for assembly, giving rise to complex assembly graphs and fragmented assemblies. This is often the case for clinical samples from HIV and Hepatitis C patients, in which high rates of mutation and long durations of infection can contribute to extreme population divergence, but can also be observed in environmental samples. Where such diversity exists, alignment based probabilistic population reconstruction approaches can be effective, permitting the reconstruction of individual viral variants into 'haplotypes' exceeding read length. This problem has been well studied, and tools such as ShoRAH, QuRE, and PredictHaplo (Giallonardo et al. 2014 ) are designed for haplotyping viral populations. ShoRAH (Zagordi et al. 2011 ) extracts local alignments of a specified window length, reconstructs haplotypes for each 'cluster' in that window, and removes mutations from sequences in the cluster not matching the reconstructed haplotype using a model-based probabilistic clustering algorithm. QuRe (Prosperi and Salemi 2012; Prosperi et al. 2013 ) removes nucleotide substitutions and indels with a Poisson model and reconstructs haplotypes using a heuristic algorithm based on a multinomial distribution. Both approaches have the advantage of reporting probabilities for the reconstructed haplotypes. PredictHaplo is notable for taking into account the read pairing information in Illumina data. A limitation of all of these approaches; however, is their reliance upon a single reference sequence with which to perform the initial alignment, a process which assumes a degree of sequence similarity which may not always be observed in diverse regions, such as regions encoding envelope proteins, of RNA virus genomes. This can be mitigated through construction of a data-specific template through iterative reference mapping and consensus refinement strategies (Archer et al. 2010; B rinda, Boeva, and Kucherov 2016) . Other possibilities for broader utility of these approaches include the use of multiple viral reference sequences, either through consideration of multiple linear sequences or by direct alignment of sequences to a variation graph [https://github.com/vgteam/ vg], an emerging approach for modeling genomic variation.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date