Author: Niccolo Alfano; Anisha Dayaram; Jan Axtner; Kyriakos Tsangaras; Marie-Louise Kampmann; Azlan Mohamed; Seth Timothy Wong; M. Thomas P. Gilbert; Andreas Wilting; Alex Daivd Greenwood
Title: Non-invasive surveys of mammalian viruses using environmental DNA Document date: 2020_3_29
ID: nil1vv6h_39
Snippet: A) Leech reads were removed from the dataset by alignment to the Helobdella robusta genome v1.0 (assembly GCA_000326865.1), which is the only complete genome of Hirudinea available in GenBank, and all leech sequences from GenBank (4,957 sequences resulting from "Hirudinea" search) using Bowtie2 v2.3.5.1 [37] . This left 81% of the original reads (Suppl. Tab. 1). Then, the filtered reads were searched by BLAST against a database generated from the.....
Document: A) Leech reads were removed from the dataset by alignment to the Helobdella robusta genome v1.0 (assembly GCA_000326865.1), which is the only complete genome of Hirudinea available in GenBank, and all leech sequences from GenBank (4,957 sequences resulting from "Hirudinea" search) using Bowtie2 v2.3.5.1 [37] . This left 81% of the original reads (Suppl. Tab. 1). Then, the filtered reads were searched by BLAST against a database generated from the capture bait sequences. The reads which matched with baits were then extracted and screened against the entire NCBI nucleotide database (nt) using BLASTn to find the best viral match. The filtered reads were mapped both to the corresponding bait sequence and the genome sequence of the best hit obtained by BLAST against the complete nt database, in order to generate a consensus sequence. This consensus sequence was again searched against the NCBI nt database using BLASTn to obtain a viral assignment. B) Leech reads were removed as in method A. In addition, rRNA reads were removed using SortMeRNA [38] , leaving 75% of the original reads (Suppl. Tab. 1). The filtered reads were de novo assembled using both Spades v3.11.1 [39] and Trinity v2.6.6 [40] assemblers. The obtained contigs from Spades and Trinity were pooled and clustered to remove duplicated or highly similar sequences using USEARCH v11.0.667 [41] with a 90% threshold identity value. The centroids were then subjected to sequential BLAST searches against the NCBI nucleotide database and NCBI RefSeq viral protein database using BLASTn and BLASTx, respectively. C) The adaptor and quality trimmed data were uploaded to Genome Detective [42] , a web base software that assembles viral genomes from NGS data. The software first groups reads into different buckets based on the proteins similarity to different viral hits. Genome detective then de novo assembles the reads of each bucket creating a longer consensus sequence that is then searched against the NCBI RefSeq viral database using BLASTx and BLASTn algorithms. The results of amino acid and nucleotide search are combined and viral hit is assigned based on the best combined score.
Search related documents:
Co phrase search for related documents- amino acid and BLASTx BLASTn: 1, 2
- amino acid and combined score: 1
- amino acid and complete genome: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- BLAST search and complete genome: 1, 2
- BLASTn algorithm and complete genome: 1
Co phrase search for related documents, hyperlinks ordered by date