NIST Coronavirus Research Data DocResults - cc NC ND International license and estimate number

Selected article for: "cc NC ND International license and estimate number"

Author: Jiao Chen; Jiayu Shang; Jianrong Wang; Yanni Sun

Title: A binning tool to reconstruct viral haplotypes from assembled contigs

Document date: 2019_7_16

ID: 2basllfv_14

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: The overall pipeline of our method is shown in Fig. 1 . There are mainly two steps: (1) estimate the number of haplotypes by aligning contigs and identifying windows; (2) calculate relative abundances in each window and apply a clustering algorithm to group clusters of the same haplotype. The underlying algorithm of grouping contigs into haplotypes is prototype-based clustering (Tan et al., 2005) . Features such as the overlaps and paired-end con.....

KG: Link to Knowledge Graph

Complete Snippet

Document: The overall pipeline of our method is shown in Fig. 1 . There are mainly two steps: (1) estimate the number of haplotypes by aligning contigs and identifying windows; (2) calculate relative abundances in each window and apply a clustering algorithm to group clusters of the same haplotype. The underlying algorithm of grouping contigs into haplotypes is prototype-based clustering (Tan et al., 2005) . Features such as the overlaps and paired-end connections have limited usage in grouping distant contigs from the same haplotype. The clustering will mainly use the features based on the abundance distributions. Although abundance-based clustering has been used for contig binning from multiple samples (Wu et al., 2014; Quince et al., 2017) , existing tools are not designed to tackle key challenges of distinguishing contigs of different haplotypes. First, the observed coverage of each contig not only depends on the abundance of the underlying haplotype, but also depends on whether it is a unique or shared region by two or more haplotypes. Second, heterogeneous coverage of each haplotype in an RNA viral quasispecies is common, which is caused by sequencing-related biases and compounded by gene expression. Thus, directly applying existing prototype-based clustering . CC-BY-NC-ND 4.0 International license The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a . https://doi.org/10.1101/704288 doi: bioRxiv preprint models such as Gaussian-mixture model to contigs is not expected to produce accurate clustering. Our solution to this problem is to cut the contigs into "windows" and to apply the clustering on sub-contigs that are more likely to represent one haplotype. In addition, instead of assuming any parametric distribution, which is usually not the case for haplotype contigs, we will use a non-parametric distribution.

Search related documents:

Co phrase search for related documents

abundance distribution and cluster algorithm: 1, 2, 3
cc NC ND International license and cluster algorithm: 1, 2
cc NC ND International license and different haplotype: 1
contig binning and different haplotype: 1, 2, 3
contig binning and different haplotype contig: 1, 2

Co phrase search for related documents, hyperlinks ordered by date

ABSTRACT:

TERMS:

DOCUMENTS: