Selected article for: "FM index and total runtime"

Author: Marie Hoffmann; Michael T. Monaghan; Knut Reinert
Title: PriSeT: Efficient De Novo Primer Discovery
  • Document date: 2020_4_7
  • ID: 3b3hv53b_79_1
    Snippet: anzini, 2000) . Taking into account at most N − k + 1 different k-mers, the upper limit is O(N (k + occ)) (see Table 9 ). All chemical filters in the filter & transform step analyse the encoded sequences in a single-or multi-pass fashion. Some require simple counting (Tm, CG content) or pattern match (dinucleotide runs, (A|T) 3 tails). The most complex one is the self-annnealing filter in which a k-mer is shifted at most 2k times and XORed agai.....
    Document: anzini, 2000) . Taking into account at most N − k + 1 different k-mers, the upper limit is O(N (k + occ)) (see Table 9 ). All chemical filters in the filter & transform step analyse the encoded sequences in a single-or multi-pass fashion. Some require simple counting (Tm, CG content) or pattern match (dinucleotide runs, (A|T) 3 tails). The most complex one is the self-annnealing filter in which a k-mer is shifted at most 2k times and XORed against its complement or reverse complement. The identification of a connected self-annealing pattern of size four (corresponding to a 0b11111111 pattern) can be done in constant time by using bit parallelism (see code on GitHub for details). Filtering is done on at most N k-mers, giving us a total runtime of O(kN ). For storing the bit vector transformed references, and the ranked TKmerIDs we need O(N ) space. During the combine step we additionally store matching k-mers. A reference has an expected length of N n bases. For each forward k-mer we search with an offset of τ min a candidates' window of size τ max − τ min . Hence, we have at most O(( N n − τ min )(τ max − τ min )) pairs per reference (κ min , κ max are dropped here). For each pair a matchability check is done linear in the expected length of the two k-mers, i.e. in O(k) (see C p in Table 2 ). Altogether we have n references, giving us a total combination runtime of O(nk( N n − τ min )(τ max − τ min )). Space occupancy is linear for the FM index, frequency, and filter & transform steps. Per text position we use at most one 64 bit unsigned integer independent of how many k-mers with varying lengths occur at a specific position (see compression scheme described in Section 3.3.1) and

    Search related documents: