Selected article for: "local blastn and low complexity"

Author: Stenglein, Mark D.; Jacobson, Elliott R.; Wozniak, Edward J.; Wellehan, James F. X.; Kincaid, Anne; Gordon, Marcus; Porter, Brian F.; Baumgartner, Wes; Stahl, Scott; Kelley, Karen; Towner, Jonathan S.; DeRisi, Joseph L.
Title: Ball Python Nidovirus: a Candidate Etiologic Agent for Severe Respiratory Disease in Python regius
  • Document date: 2014_9_9
  • ID: rb3qdunj_50
    Snippet: Sequence analysis. The sequence analysis pipeline was similar to that employed previously (59, 76) . Low-quality sequences and low-complexity sequences (defined as having a ratio of the length of the Lempel-Ziv-Welch compressed sequence to the uncompressed length of less than 0.46) were removed from further analysis. The first 5 bases of each sequence were trimmed, as was the last base. The CD-HIT sequence clustering tool was then used to collaps.....
    Document: Sequence analysis. The sequence analysis pipeline was similar to that employed previously (59, 76) . Low-quality sequences and low-complexity sequences (defined as having a ratio of the length of the Lempel-Ziv-Welch compressed sequence to the uncompressed length of less than 0.46) were removed from further analysis. The first 5 bases of each sequence were trimmed, as was the last base. The CD-HIT sequence clustering tool was then used to collapse reads with Ͼ98% global pairwise identity (77) . Host-derived sequences were then filtered, first by using the BLASTn alignment tool (version 2.2.25ϩ [30] ) to query a database of snake ribosomal and mitochondrial sequences and then by using the Bowtie2 alignment tool (version 2.0.0-beta7 [26] ) to query databases composed of draft assemblies of the Burmese python and boa constrictor genomes (20, 21) . Sequences aligning with an expect value of less than 10 Ϫ12 (BLASTn) or with -local mode alignment scores greater than 86 (Bowtie) were filtered. Similarly, sequences that aligned to the Illumina adapter sequences (see Table S1 in the supplemental material) or to PhiX-174 control sequence were removed. This filtering removed on average 98% of sequences (see Table S2 ). The remaining sequences were searched against custom databases of viral protein sequences using the BLASTx alignment tool. Sequences aligning to any viral protein sequence with an expect value of less than 0.25 were further examined. False positives were reduced by using BLAST to align putative viral sequences to the NCBI nonredundant nucleotide (nt) and protein (nr) databases. Only sequences whose best hit and whose pair's best hit were still to viral sequences were retained. The PRICE de novo targeted genome assembler was used to generate initial contiguous virus sequences (22) . The reference assembly (NCBI accession no. KJ541759) was assembled using reads from a single snake (no. 2); other animals lacked sufficient coverage to assemble the complete genome. To validate this assembly and generate coverage and pairwise identity data, reads were aligned to Sanger validated assemblies using the BLASTn algorithm. Sequencing data have been deposited in the UCSF Integrated Data Repository.

    Search related documents:
    Co phrase search for related documents
    • alignment tool and BLASTn alignment tool: 1, 2, 3, 4, 5
    • alignment tool and BLASTx alignment tool: 1
    • alignment tool and boa constrictor: 1
    • alignment tool and complete genome: 1, 2, 3
    • alignment tool and control sequence: 1
    • analysis pipeline and boa constrictor: 1
    • analysis pipeline and boa constrictor burmese python: 1
    • analysis pipeline and burmese python: 1
    • analysis pipeline and complete genome: 1, 2, 3, 4, 5, 6
    • assembly validate and complete genome: 1
    • best hit and complete genome: 1
    • BLASTn alignment tool and boa constrictor: 1
    • boa constrictor and burmese python: 1
    • boa constrictor and complete genome: 1, 2, 3
    • boa constrictor burmese python and burmese python: 1