Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data Document date: 2020_1_30
ID: ac00tai9_12
Snippet: We accessed the Virus-Host Database (Mihara et al., 2016) on July 31, 2019 and downloaded all the available data. The original dataset contained 14,380 records comprising Ref-Seq IDs for viral sequences and associated metadata. Some viruses are divided into discontiguous segments, which are represented as separate records in VHDB; in those cases the segments were treated as contigs of a single genome in the further analysis. We removed records wi.....
Document: We accessed the Virus-Host Database (Mihara et al., 2016) on July 31, 2019 and downloaded all the available data. The original dataset contained 14,380 records comprising Ref-Seq IDs for viral sequences and associated metadata. Some viruses are divided into discontiguous segments, which are represented as separate records in VHDB; in those cases the segments were treated as contigs of a single genome in the further analysis. We removed records with unspecified host information and those confusing the highly pathogenic Variola virus with a similarly named genus of fish. Further, we filtered out viroids and satellites. Human-infecting viruses were extracted by searching for records containing "Homo sapiens" in the "host name" field. Note that VHDB contains information about multiple possible hosts for a given virus where appropriate. Any virus infecting humans was assigned to the positive class, also if other, non-human hosts exist. In total, the dataset contained 9,496 viruses, including 1,309 human viruses. We considered both DNA and RNA viruses; RNA sequences were encoded in the DNA alphabet, as in RefSeq.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date