Selected article for: "locality sensitive hashing and lsh locality sensitive hashing"

Author: Hayden C. Metsky; Katherine J. Siddle; Adrianne Gladden-Young; James Qu; David K. Yang; Patrick Brehio; Andrew Goldfarb; Anne Piantadosi; Shirlee Wohl; Amber Carter; Aaron E. Lin; Kayla G. Barnes; Damien C. Tully; Björn Corleis; Scott Hennigan; Giselle Barbosa-Lima; Yasmine R. Vieira; Lauren M. Paul; Amanda L. Tan; Kimberly F. Garcia; Leda A. Parham; Ikponmwonsa Odia; Philomena Eromon; Onikepe A. Folarin; Augustine Goba; Etienne Simon-Lorière; Lisa Hensley; Angel Balmaseda; Eva Harris; Douglas Kwon; Todd M. Allen; Jonathan A. Runstadler; Sandra Smole; Fernando A. Bozza; Thiago M. L. Souza; Sharon Isern; Scott F. Michael; Ivette Lorenzana; Lee Gehrke; Irene Bosch; Gregory Ebel; Donald Grant; Christian Happi; Daniel J. Park; Andreas Gnirke; Pardis C. Sabeti; Christian B. Matranga
Title: Capturing diverse microbial sequence with comprehensive and scalable probe design
  • Document date: 2018_3_12
  • ID: a9lkhayg_42
    Snippet: CATCH produces a set of "candidate" probes from the input sequences in d by stepping along them according to a specified stride (Fig. 1a) . Optionally, CATCH uses locality-sensitive hashing 33, 34 (LSH) to reduce the number of candidate probes, which is particularly useful when the input is a large number of highly similar sequences. CATCH supports two LSH families: one under Hamming distance 33 and another using the MinHash technique 34, 64 , wh.....
    Document: CATCH produces a set of "candidate" probes from the input sequences in d by stepping along them according to a specified stride (Fig. 1a) . Optionally, CATCH uses locality-sensitive hashing 33, 34 (LSH) to reduce the number of candidate probes, which is particularly useful when the input is a large number of highly similar sequences. CATCH supports two LSH families: one under Hamming distance 33 and another using the MinHash technique 34, 64 , which has been used in metagenomic applications 65, 66 . It detects near-duplicate candidate probes by performing approximate near neighbor search 34 using a specified family and distance threshold. CATCH constructs hash tables containing the candidate probes and then queries each (in descending order of multiplicity) to find and collapse near-duplicates. Because LSH reduces the space of candidate probes, it may remove candidate probes that would otherwise be selected in steps described below, thereby increasing the size of the output probe set. Use of LSH to reduce the number of candidate probes is optional in our implementation of CATCH; we did not use it to produce the probe sets in this work. The approach of detecting near-duplicates among probes (and subsequently mapping them onto sequences, described below) bears some similarity to the use of P-clouds for clustering related oligonucleotides in order to identify diverse repetitive regions in the human genome 67, 68 .

    Search related documents:
    Co phrase search for related documents
    • candidate probe set and CATCH implementation: 1