Selected article for: "different virus and sequence feature"

Author: Alejandro Lopez-Rincon; Alberto Tonda; Lucero Mendoza-Maldonado; Eric Claassen; Johan Garssen; Aletta D. Kraneveld
Title: Accurate Identification of SARS-CoV-2 from Viral Genome Sequences using Deep Learning
  • Document date: 2020_3_14
  • ID: c2lljdi7_31
    Snippet: The convolutional layers of CNNs de-facto learn new features to characterize the problem, directly from the data. In this specific case, the new features are 165 specific sequences of base pairs that can more easily separate different virus strains (Fig. 12) . By analyzing the result of each filter in a convolutional layer, and how its output interacts with the corresponding max pooling layer, it is possible to detect human-readable sequences of .....
    Document: The convolutional layers of CNNs de-facto learn new features to characterize the problem, directly from the data. In this specific case, the new features are 165 specific sequences of base pairs that can more easily separate different virus strains (Fig. 12) . By analyzing the result of each filter in a convolutional layer, and how its output interacts with the corresponding max pooling layer, it is possible to detect human-readable sequences of base pairs that might provide domain experts with important information. It is important to notice that 170 these sequences are not bound to specific locations of the genome; thanks to its structure, the CNN is able to detect them and recognize their importance even if their position is displaced in different samples. For this purpose, we use the trained CNN described in Subsection 2.2, that obtained an accuracy of 98.75% in a 10-fold cross-validation. In a first step, 175 we plot the inputs and outputs of the convolutional layer, to visually inspect for patterns. As an example, in Fig. 13 we report the visualization of the first The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.03.13.990242 doi: bioRxiv preprint promising, as it seems to focus on the a few relevant points in the genome, and it is thus most likely able to identify meaningful sequences. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.03.13.990242 doi: bioRxiv preprint 21-bps sequence that obtained the highest value from the convolutional filter, in a specific 148-position interval of the original genome: the first max pooling 195 feature will cover positions 1-148, the second will cover position 149-296, and so on. We graph the whole set of max pooling features for the complete data, The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.03.13.990242 doi: bioRxiv preprint to 148 positions). As some samples might present sequences that are displaced even more, in the next experiments we decided to just consider the relative frequency of the 21-pbs sequences identified at the previous step, creating a sequence feature space, to verify whether the appearance of specific sequences 210 could be enough to differentiate between virus strains.

    Search related documents:
    Co phrase search for related documents
    • accuracy obtain and different sample: 1
    • base pair and different sample: 1
    • CNNs convolutional layer and convolutional layer: 1
    • convolutional filter and cross validation: 1, 2
    • convolutional layer and cross validation: 1