Author: Pandya, Gagan A.; Holmes, Michael H.; Sunkara, Sirisha; Sparks, Andrew; Bai, Yun; Verratti, Kathleen; Saeed, Kelly; Venepally, Pratap; Jarrahi, Behnam; Fleischmann, Robert D.; Peterson, Scott N.
Title: A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip® whole-genome resequencing platform Document date: 2007_11_15
ID: 16tii0ha_8
Snippet: The F. tularensis GeneChip Õ set was designed on the basis of the DNA sequence of strains LVS (GenBank Accession: AM 233362) and SCHU S4 (GenBank Accession: AJ 749949) available at http://cmr.tigr.org. Sequences of plasmids, pOM1 (GenBank Accession: NC 002109) and pFNL10 (GenBank Accession: NC 004952), were obtained from the NCBI database (http:// www.ncbi.nlm.nih.gov/). The LVS sequence used in this study was obtained from The Microbial Genomic.....
Document: The F. tularensis GeneChip Õ set was designed on the basis of the DNA sequence of strains LVS (GenBank Accession: AM 233362) and SCHU S4 (GenBank Accession: AJ 749949) available at http://cmr.tigr.org. Sequences of plasmids, pOM1 (GenBank Accession: NC 002109) and pFNL10 (GenBank Accession: NC 004952), were obtained from the NCBI database (http:// www.ncbi.nlm.nih.gov/). The LVS sequence used in this study was obtained from The Microbial Genomics group, Lawrence Livermore National Laboratory, Los Alamos, prior to its submission to NCBI. This sequence differs from the submitted sequence by 13 insertions and deletions (indels, 12 single base and 1 two base), and 12 variant base calls. All but four of these differences lie in repeat regions that were excluded from our design. The remaining differences are a single base insertion in the final sequence near the start of one of the fragments (or instructions) on the array, and three single base call changes. A merged sequence was constructed based on these genomic and plasmid sequences for the purposes of GeneChip Õ design. The F. tularensis LVS and SCHU S4 genomes are 1 895 998 and 1 892 819 bp, respectively. An in silico analysis was performed to identify unique sequences from SCHU S4 (ranging from 1 bp to 11 086 bp) that were appended to the LVS sequence along with plasmid pOM1 sequence and unique regions from pFNL10. There are 12 869 bp of sequences unique to LVS relative to SCHU S4 and 42 369 bp present in SCHU S4 but not LVS. In total, this analysis defined 1 943 751 bp of F. tularensis sequence. We used the MUMmer tool set (http://mummer.sourceforge.net/) and repeatFinder [based on REPuter(c) Copyright University of Bielefeld, Germany (http://www.genomes.de/)] to identify 170 356 and 139 560 bp of repetitive sequence in LVS and SCHU S4, respectively. A total of 179 193 bp. (9.22%) of repetitive sequence were excluded from the design, resulting in 1 764 558 queryable bases (91% of the F. tularensis genome) for resequencing by hybridization. A total of 1 769 695 bp were submitted for chip production by adding back 5137 bp from the immediate flanks of excluded repeats as padded bases. This sequence was tiled onto a set of six CustomSeq 300 K GeneChips Õ by Affymetrix, Inc. (Santa Clara, CA), consisting of 14 125 688 individual probes. A maximum of 303 366 bases of double-stranded DNA can be resequenced on a 300 K array.
Search related documents:
Co phrase search for related documents- base call and dna sequence: 1, 2, 3
- base call and final sequence: 1
- base insertion and dna sequence: 1
- dna sequence and double strand: 1, 2, 3, 4
- dna sequence and double strand dna: 1, 2, 3
- dna sequence and final sequence: 1
- dna sequence and individual probe: 1
Co phrase search for related documents, hyperlinks ordered by date