Selected article for: "International license and sputum sample"

Author: Marie Hoffmann; Michael T. Monaghan; Knut Reinert
Title: PriSeT: Efficient De Novo Primer Discovery
  • Document date: 2020_4_7
  • ID: 3b3hv53b_84_0
    Snippet: PriSeT operates batch-wise, i.e. all k-mers with frequencies exceeding are collected at once into a single data structure, filtered chemically, and combined reference-wise. To give an example, Dikarya from the Fungi realm (clade 451864) produces 119.4 million k-mers. The main memory occupation of the location map received from GenMap represents the current bottleneck of PriSeT. Processing libraries beyond 500 MB is currently only feasible when in.....
    Document: PriSeT operates batch-wise, i.e. all k-mers with frequencies exceeding are collected at once into a single data structure, filtered chemically, and combined reference-wise. To give an example, Dikarya from the Fungi realm (clade 451864) produces 119.4 million k-mers. The main memory occupation of the location map received from GenMap represents the current bottleneck of PriSeT. Processing libraries beyond 500 MB is currently only feasible when increasing the k-mer frequency cutoff, s.t. not more than roughly 120 million k-mers (≈ 1 Gigabyte) are produced 16 . In a future PriSeT version this can be tackled by interweaving k-mer frequency and filtering: a frequent k-mer immediately undergoes filtering, and is only collected when satisfying the frequency threshold and constraint set C s . This reduces the overall amount of temporarily stored k-mers. Input libraries composed of multiple reference sequences would additionally profit from reference-wise partitioning approach. This strategy can be carried on to the combine step, since k-mers are only combinable if they refer to the same sequence; each of the references is processible in parallel. The current version of PriSeT does not use any thread or process parallelism. The computationally most expensive part of PriSeT is the combine step with a runtime quadratic in N where the window size plays in. It is therefore important to set the target read lengths sizes as tight as possible (see Table 9 ). Stable binding of primer to template is crucial for the success of a PCR. A single mismatch, especially at the 3'-end, may result in an ineffective PCR. On that account PriSeT is using the (k, 0)-frequency to gather only k-mer locations with 100 % sequence identity. A future version of PriSeT may allow for up to four errors e (or mismatches) for primer sequences in case there are calls for it. When allowing errors, a single k-mer occurrence is counted into all location collections associated to k-mers with Hamming distances ≤ e. This has a huge impact on the collection sizes and will have to be chosen carefully. In metabarcoding we are sometimes interested in primer pairs enclosing barcodes which allow guaranteed distinction of species from clade X from species of another clade Y. If we take as an example the recent SARS-CoV-2 outbreak, one question is, given a sputum sample of a person with flu-like symptoms, does it contain viruses of influenza or corona? We do not want the test to produce false negatives, 16 exemplary for a desktop computer with 16 GB RAM . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.06.027961 doi: bioRxiv preprint Figure 6 . Runtimes for all clade data sets broken down to frequency computation, filter & transform, and combine step. The k-mer frequency cutoff was set to 5% w.r.t. the number of references per clade. The results for Fungi (clade 451864, 500 MB large) are omitted here due to the necessity of setting the cutoff to 10%, which results in runtimes comparable with clade 6231 (82 MB). Both axes are log 2 -scaled. Figure 7 . K-mer Counts for all clade data sets broken down to frequency computation, filter & transform, and combine step. K-Mers for all clade data sets counted after the frequency computation, filter & transform, and combine steps. For the combine step pairs are counted, not k-mers. The settings are the same as f

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1