Selected article for: "access open and address issue"

Author: Gardner, Paul P.; Daub, Jennifer; Tate, John G.; Nawrocki, Eric P.; Kolbe, Diana L.; Lindgreen, Stinus; Wilkinson, Adam C.; Finn, Robert D.; Griffiths-Jones, Sam; Eddy, Sean R.; Bateman, Alex
Title: Rfam: updates to the RNA families database
  • Document date: 2008_10_25
  • ID: wj7yonjw_5
    Snippet: In order to make it feasible to search more than 120 gigabases of sequence with hundreds of covariance models in a reasonable time, we use sequence-based filters to prune the search space prior to applying the more accurate and more computationally expensive CMs. One of the primary limitations of the Rfam annotation pipe-line has been the use of BLAST-based sequence filters, which are likely to compromise search sensitivity. In order to address t.....
    Document: In order to make it feasible to search more than 120 gigabases of sequence with hundreds of covariance models in a reasonable time, we use sequence-based filters to prune the search space prior to applying the more accurate and more computationally expensive CMs. One of the primary limitations of the Rfam annotation pipe-line has been the use of BLAST-based sequence filters, which are likely to compromise search sensitivity. In order to address this issue at least partially, NCBI-BLAST has been replaced with a WU-BLAST search, which has been tuned for high sensitivity and low sequence similarity. A benchmark of several homology search tools has shown WU-BLAST to be the more accurate of the two methods on nucleotide data (5) . Additionally, in order to make the BLAST filters more *To whom correspondence should be addressed. Tel: +44 1223 494 983; Fax: +44 1223 494 919; Email: pg5@sanger.ac.uk ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. similar to profile HMMs, a sequence mask has been applied to each sequence in the alignment. Any nucleotide in an alignment column that has either a low frequency or is an insert relative to the majority of the rest of the sequences is 'soft masked' and not used for the BLAST word matches. These masked nucleotides do, however, still contribute to alignments that were seeded in the flanking regions. This approach has resulted in many fewer spurious hits with no detectable cost to sensitivity (data not shown), thus allowing E-value thresholds to be further relaxed. These observations together mean that the BLAST filters have been improved in terms of specificity and sensitivity.

    Search related documents:
    Co phrase search for related documents
    • alignment sequence and covariance model: 1, 2, 3, 4, 5, 6, 7