Author: Marie Hoffmann; Michael T. Monaghan; Knut Reinert
Title: PriSeT: Efficient De Novo Primer Discovery Document date: 2020_4_7
ID: 3b3hv53b_82
Snippet: Despite the reference database has a low sequence coverage for plankton taxa, we found some new primer pairs offering a larger coverage or barcode variation than published ones and are chemically suitable for a paired-end PCR (see Table 2 ). PriSeT correctly output primer pairs that are known to be present in the library if and only if they pass the constraint sets C s and C p (see Table 2 ). When having complete genomes available for primer dis.....
Document: Despite the reference database has a low sequence coverage for plankton taxa, we found some new primer pairs offering a larger coverage or barcode variation than published ones and are chemically suitable for a paired-end PCR (see Table 2 ). PriSeT correctly output primer pairs that are known to be present in the library if and only if they pass the constraint sets C s and C p (see Table 2 ). When having complete genomes available for primer discovery too many candidates are produced and it is necessary to narrow down the primer sequence constraints or filter in a post-processing step, e.g., for pairs producing amplicons that are distinctive or span exons. The experiments showed that when searching primer pairs for metabarcoding experiments, it is appropriate to use frequency as an initial filtering heuristic. Only k-mers occurring with a minimum frequency will later satisfy sufficient coverage or amplicon variation. The FM index is a transformation that supports frequency queries with lower costs compared to a seed-and-extend approach, e.g. FastPCR by Kalendar et al. (2017) , or a MSA-based approach, which requires manageable data sets in order to identify conserved regions serving as primer binding sites. None of the existing primer search tools that we found is capable of processing multi-sequence libraries and optimizing for frequent primer pairs at the same time. PriSeT is built to fill this gap. Its heuristic approach additionally avoids the necessity to curate an existing library and makes it robust against mislabeled or poor quality references. This in turn gives users more resources to focus on the actual analysis. With sinking costs of NGS, databases are growing on a daily basis, making curation even infeasible and with regard to the sparseness of some clades, we cannot afford to exclude resources. Since GenBank has no standard specification for labeling sequences by their origin upon upload (e.g. as 18S or COI), our sampling approach also collected non-18S sequences, which explains the relatively low values for coverage and amplicon variation. When a user evaluates PriSeT's computed primer pairs and their statistics, the heterogeneity of the database has to be taken into consideration. The PriSeT version at hand does not include coverage or amplicon variation criteria into the filtering for not limiting options -the benefit of a higher coverage is in many cases paid with a lower number of distinct reads (see results in Table 8 ). We leave it up to the user to decide when coverage is favored over amplicon variation and vice versa. However, the recent coronavirus outbreak demonstrated the importance of scenarios where the goal is to yield barcodes being discriminators of clades.
Search related documents:
Co phrase search for related documents- amplicon variation coverage and barcode variation: 1, 2, 3, 4
Co phrase search for related documents, hyperlinks ordered by date