Selected article for: "database sequence and sequence database"

Author: Rose, Rebecca; Constantinides, Bede; Tapinos, Avraam; Robertson, David L; Prosperi, Mattia
Title: Challenges in the analysis of viral metagenomes
  • Document date: 2016_8_3
  • ID: x3u9i1vq_33
    Snippet: 1. The emergence of virus-specific assembly and metagenomic tools is a relatively recent phenomenon, with many of the methodologies in use today repurposing one or more existing algorithms. These tools mostly target a small audience of expert users and, as with most research software, decay after initial release due to a lack of ongoing funding, poor software development practices and/or authors' change of circumstances (Duck et al. 2016 ). There.....
    Document: 1. The emergence of virus-specific assembly and metagenomic tools is a relatively recent phenomenon, with many of the methodologies in use today repurposing one or more existing algorithms. These tools mostly target a small audience of expert users and, as with most research software, decay after initial release due to a lack of ongoing funding, poor software development practices and/or authors' change of circumstances (Duck et al. 2016 ). There is a need for a better balance between research software presenting novel methodologies and for sustainably developed, documented and tested software distributed through robust and user friendly channels such as package managers so as to increase the useful life of viral informatics software. Researchers and granting agencies should consider the importance of this step and allocate resources accordingly. 2. Democratisation of routine analyses through development of user friendly, locally installable software and remote web services is critical. Preconfigured cloud virtual machines offer a convenient, low cost way to run analyses, yet must permit straightforward sequence database and software version updates so as to remain relevant after their initial release. 3. Maintaining up to date indexes of large sequence databases is a problem all classification tools must address, stipulating access either to powerful computers for index construction or the ability to download the prebuilt indexes over a fast connection. Furthermore, classification of viral sequences is critically dependent upon the quality of curated viral databases such as RefSeq, to which submitting newly discovered sequences can be prohibitively time consuming. A solution might involve the creation of a central database containing for any given sequencing project both raw reads as well as filtered, assembled and/or annotated reads, and analysed using a single central pipeline. On a regular basis, the database could report sequences and corresponding metadata for unclassified 'dark matter', which is often discarded and yet is likely to contain sequences belonging to novel pathogens. By combining the dark matter from multiple studies, trends within these unclassified reads may be identified that could lead to greater power to identify new biological entities. 4. Benchmarking of software also remains an open problem within the field, which lacks standardized test datasets that are used across multiple studies. Often benchmarking datasets are chosen to highlight the advantages of the method under study, and therefore may be quite specific for a given application. Thus the field needs to agree upon a set of standard, well-characterized reference datasets for virusfocused studies.

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1