Author: Shcherbina, Anna; Ricke, Darrell O.; Chiu, Nelson
                    Title: Evaluating performance of metagenomic characterization algorithms using in silico datasets generated with FASTQSim  Cord-id: csokkcqq  Document date: 2016_3_31
                    ID: csokkcqq
                    
                    Snippet: Background In silico bacterial, viral, and human truth datasets were generated to evaluate available metagenomics algorithms. Sequenced datasets include background organisms, creating ambiguity in the true source organism for each read. Bacterial and viral datasets were created with even and staggered coverage to evaluate organism identification, read mapping, and gene identification capabilities of available algorithms. These truth datasets are provided as a resource for the development and ref
                    
                    
                    
                     
                    
                    
                    
                    
                        
                            
                                Document: Background In silico bacterial, viral, and human truth datasets were generated to evaluate available metagenomics algorithms. Sequenced datasets include background organisms, creating ambiguity in the true source organism for each read. Bacterial and viral datasets were created with even and staggered coverage to evaluate organism identification, read mapping, and gene identification capabilities of available algorithms. These truth datasets are provided as a resource for the development and refinement of metagenomic algorithms. Algorithm performance on these truth datasets can inform decision makers on strengths and weaknesses of available algorithms and how the results may be best leveraged for bacterial and viral organism identification and characterization. Source organisms were selected to mirror communities described in the Human Microbiome Project as well as the emerging pathogens listed by the National Institute of Allergy and Infectious Diseases. The six in silico datasets were used to evaluate the performance of six leading metagenomics algorithms: MetaScope, Kraken, LMAT, MetaPhlAn, MetaCV, and MetaPhyler. Results Algorithms were evaluated on runtime, true positive organisms identified to the genus and species levels, false positive organisms identified to genus and species level, read mapping, relative abundance estimation, and gene calling. No algorithm out performed the others in all categories, and the algorithm or algorithms of choice strongly depends on analysis goals. MetaPhlAn excels for bacteria and LMAT for viruses. The algorithms were ranked by overall performance using a normalized weighted sum of the above metrics, and MetaScope emerged as the overall winner, followed by Kraken and LMAT. Conclusions Simulated FASTQ datasets with well-characterized truth data about microbial community composition reveal numerous insights about the relative strengths and weaknesses of the metagenomics algorithms evaluated. The simulated datasets are available to download from the Sequence Read Archive (SRP062063).
 
  Search related documents: 
                                Co phrase  search for related documents- livermore metagenomic lmat analysis toolkit and lmat analysis toolkit: 1
 
                                Co phrase  search for related documents, hyperlinks ordered by date