Selected article for: "database search and sequence segment"

Author: Lamia Wahba; Nimit Jain; Andrew Z. Fire; Massa J. Shoura; Karen L. Artiles; Matthew J. McCoy; Dae-Eun Jeong
Title: Identification of a Pangolin Niche for a 2019-nCoV-like Coronavirus via an Extensive Meta-metagenomic Search
  • Document date: 2020_2_14
  • ID: emr0eh0i_9
    Snippet: Search Software: For rapid identification of close matches among large numbers of 83 metagenomic reads, we used a simple dictionary based on the 2019-nCoV sequence (NCBI 84 MN908947.3Wuhan-Hu-1) and its reverse complement, querying every 8th k-mer along the 85 individual reads for matches to the sequence. As a reference, and to benchmark the workflow 86 further, we included several additional sequences in the query (Vaccinia virus, an arbitrary 8.....
    Document: Search Software: For rapid identification of close matches among large numbers of 83 metagenomic reads, we used a simple dictionary based on the 2019-nCoV sequence (NCBI 84 MN908947.3Wuhan-Hu-1) and its reverse complement, querying every 8th k-mer along the 85 individual reads for matches to the sequence. As a reference, and to benchmark the workflow 86 further, we included several additional sequences in the query (Vaccinia virus, an arbitrary 87 segment of a flu isolate, the full sequence of bacteriophage P4, and a number of putative 88 polinton sequences from Caenorhabditis briggsae). The relatively small group of k-mers 89 being queried (<10 6 ) allows a rapid search for homologs. This was implemented in a Python 90 script run using the PyPy accelerated interpreter. We stress that this is by no means the most 91 comprehensive or fastest search for large datasets. However, it is more than sufficient to 92 rapidly find any closely matching sequence (with the downloading and conversion of the data, 93 rather than the search, being rate limiting). Figure S1 . The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.08.939660 doi: bioRxiv preprint Results: 117 To identify biological niches that might harbor viruses closely related to 2019-nCoV, 118 we searched through publicly available metaviromic datasets. We were most interested in 119 viruses with highly similar sequences, as these would likely be most useful in forming 120 hypotheses about the origin and pathology of the recent human virus. We thus set a 121 threshold requiring matching of a perfect 32-nucleotide segment with a granularity of 8 122 nucleotides in the search (i.e., interrogating the complete database of k-mers from the virus 123 with k-mers starting at nucleotide 1, 9, 17, 25, 33 of each read from the metagenomic data for 124 a perfect match). This would catch any perfect match of 39 nucleotides or greater, with some 125 homologies as short as 32 nucleotides captured depending on the precise phasing of the read.

    Search related documents:
    Co phrase search for related documents
    • additional sequence and rapid identification: 1
    • Caenorhabditis briggsae and human virus: 1
    • close match and large number: 1