Selected article for: "sequence alignment and window size"

Author: Maximilian Krause; Adnan M. Niazi; Kornel Labun; Yamila N. Torres Cleuren; Florian S. Müller; Eivind Valen
Title: tailfindr: Alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing
  • Document date: 2019_3_25
  • ID: cq7g8azh_35
    Snippet: Unlike RNA, DNA is double-stranded. Thus, homopolymer poly(A) and poly(T) stretches can occur. To determine the read orientation, the Nanopore-specific Front and End Primer sequences (sequences in Table 2 ) are aligned against the first 100 bases extracted from FAST5 files. A read is considered poly(T)-containing, if the normalised alignment score of End Primer sequence is greater than that of the Front Primer sequence, and above the threshold of.....
    Document: Unlike RNA, DNA is double-stranded. Thus, homopolymer poly(A) and poly(T) stretches can occur. To determine the read orientation, the Nanopore-specific Front and End Primer sequences (sequences in Table 2 ) are aligned against the first 100 bases extracted from FAST5 files. A read is considered poly(T)-containing, if the normalised alignment score of End Primer sequence is greater than that of the Front Primer sequence, and above the threshold of 0.6. Conversely, a read is considered poly(A)-containing if the normalised alignment score of Front Primer sequence is greater than that of the End Primer sequence, and above the threshold of 0.6. To ensure that the full poly(A) tail is present in raw data, signal is generated by applying a sliding window (window size 10; stride 10) to the processed raw signal. Next, the slope of this mean signal is calculated between every two consecutive points. The precise start of the respective tail is considered to be the first location after the rough start site where the calculated slope is between -0.2 and 0.2, and the mean signal is between 0 to 0.3. To identify the precise tail end, the slope and the mean signals downstream of the precise tail start site are tested for violating their respective thresholds (see above). Since short non-tail-like signal spikes can randomly occur, we test the signal downstream of this tentative tail end for tail-like signal within thresholds until we either reach the end of the search window of 3000 sample points, or find another stretch of tail-like signal of at least 60 sample points in length. In the latter case, the tentative tail end is updated to the downstream tail end to account for the spike signal. The maximum allowable signal length exceeding the threshold that is located between two tail-like signal has been to set to 120 nt (e.g. 120x read-specific nucleotide translocation rate).

    Search related documents:
    Co phrase search for related documents
    • calculated slope and mean signal slope: 1