Author: Pandya, Gagan A.; Holmes, Michael H.; Sunkara, Sirisha; Sparks, Andrew; Bai, Yun; Verratti, Kathleen; Saeed, Kelly; Venepally, Pratap; Jarrahi, Behnam; Fleischmann, Robert D.; Peterson, Scott N.
Title: A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip® whole-genome resequencing platform Document date: 2007_11_15
ID: 16tii0ha_38
Snippet: genome rearrangements. We found that the majority of the false-positive SNP calls in the SCHU S4 sample fell into one of two categories: (i) those that lie within 12 bases of a rearrangement boundary and (ii) those that lie within 12 bases of a predicted SNP. These results are summarized in Table 5 . In spite of the larger number of false positives in the SCHU S4 data set, they represent only 2.04% of the SNP calls that remained after filtering. .....
Document: genome rearrangements. We found that the majority of the false-positive SNP calls in the SCHU S4 sample fell into one of two categories: (i) those that lie within 12 bases of a rearrangement boundary and (ii) those that lie within 12 bases of a predicted SNP. These results are summarized in Table 5 . In spite of the larger number of false positives in the SCHU S4 data set, they represent only 2.04% of the SNP calls that remained after filtering. Table 6 shows the comparison of raw and filtered data for LVS and SCHU S4. The raw call rate and accuracy take into account all base positions on the resequencing chips and report the results prior to any filtering steps. The genome-adjusted results take into account only those portions of the chips that have high sequence homology with the hybridized sample. The data indicated a false-negative SNP rate in the range of 0-17.31% and a false-positive rate in the range of 0.001-0.007%. The falsepositive SNP rate is the number of false positives divided by the number of bases at which a genuine SNP call was not expected. The false-negative SNP rate is the number of expected SNPs that were not identified divided by the total number of expected SNPs. The false-negative rate can be misleading, since this rate includes all expected SNPs that were not detected, including those that were not in the raw data set as well as those that were removed by our filters. Although the false-negative SNP rate for the SCHU S4 sample was 17.310%, it is important to note that the filters eliminated less than 11% of the true-positive SNPs that were in the raw data set (see Table 3 ). There is an inevitable tradeoff between the rejection of false positives and the retention of true-positive SNPs. In general, an increase in the stringency of filtering will cause a reduction in both false positives and true positives. The filtering scripts can be parameterized by the user for an appropriate tradeoff between sensitivity (retention of true positives) and specificity (rejection of true negatives). Since LVS was the primary reference whose sequence is represented most fully on the chips, the results for the LVS samples were better than we would expect to achieve with a sample of unknown composition. The efficiency of the platform cannot be numerically defined as it varies according to the extent of the difference between the sample DNA and the reference sequence.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date