Author: Jakub M Bartoszewicz; Anja Seidel; Bernhard Y Renard
Title: Interpretable detection of novel human viruses from genome sequencing data Document date: 2020_1_30
ID: ac00tai9_25
Snippet: We wanted the networks to yield accurate predictions for both 250bp (our data, modelling a sequencing run of an Illumina MiSeq device) and 150bp long reads (as in the Human Blood Virome dataset). As shorter reads are padded with zeros, we expected the CNNs trained using average pooling to misclassify many of them. Therefore, we prepared a modified version of the "Stratified" dataset, in which the last 100bp of each read were turned to zeros, mock.....
Document: We wanted the networks to yield accurate predictions for both 250bp (our data, modelling a sequencing run of an Illumina MiSeq device) and 150bp long reads (as in the Human Blood Virome dataset). As shorter reads are padded with zeros, we expected the CNNs trained using average pooling to misclassify many of them. Therefore, we prepared a modified version of the "Stratified" dataset, in which the last 100bp of each read were turned to zeros, mocking a shorter sequencing run while preserving the error model. Then, we retrained the CNN which had performed best on the original dataset. Since in principle, the Human Blood Virome dataset should not contain viruses infecting non-human Chordata, a "Chordata"-trained classifier was not used in this setting.
Search related documents:
Co phrase search for related documents- accurate prediction and sequencing run: 1, 2
Co phrase search for related documents, hyperlinks ordered by date