Author: Pandya, Gagan A.; Holmes, Michael H.; Sunkara, Sirisha; Sparks, Andrew; Bai, Yun; Verratti, Kathleen; Saeed, Kelly; Venepally, Pratap; Jarrahi, Behnam; Fleischmann, Robert D.; Peterson, Scott N.
Title: A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip® whole-genome resequencing platform Document date: 2007_11_15
ID: 16tii0ha_18
Snippet: The first filter applied, referred to as the low-homology filter (mask_low_homology.pl), seeks to identify regions that performed poorly as a result of deletions in the sample relative to the reference sequence. It scans the base calls from the CHP files to identify regions of adjacent positions that are rich in no-calls and SNP calls. It uses a sliding-window approach, first looking at windows of 50-base length (user specified) for regions whose.....
Document: The first filter applied, referred to as the low-homology filter (mask_low_homology.pl), seeks to identify regions that performed poorly as a result of deletions in the sample relative to the reference sequence. It scans the base calls from the CHP files to identify regions of adjacent positions that are rich in no-calls and SNP calls. It uses a sliding-window approach, first looking at windows of 50-base length (user specified) for regions whose content of no-calls plus SNP calls comprises 60% or greater of the specified window size. Upon encountering such regions, the algorithm uses a 10-base window to examine the sequence at higher resolution, so that the proximal breakpoint of the low homology region (generally a deletion) is properly defined. The extent of the region is determined by expanding the region using a 50-base length window as far as possible. Once the region limits are determined, the algorithm uses a 10-base window to map the breakpoint of the distal end of the deletion. SNP calls that occur within the defined low-homology region are removed from the list of high-confidence SNP calls. (The coordinates of the low-homology regions themselves are also interesting, as they represent areas of the reference sequence that are not represented in the sample.) The window sizes, and the required percentage of no-calls plus SNP calls, are parameterized and controlled by commandline options.
Search related documents:
Co phrase search for related documents- base CHP file call and CHP file call: 1, 2
Co phrase search for related documents, hyperlinks ordered by date