Selected article for: "arithmetic mean and distance matrix"

Author: Phillip Davis; John Bagnoli; David Yarmosh; Alan Shteyman; Lance Presser; Sharon Altmann; Shelton Bradrick; Joseph A. Russell
Title: Vorpal: A novel RNA virus feature-extraction algorithm demonstrated through interpretable genotype-to-phenotype linear models
  • Document date: 2020_3_2
  • ID: 48mtdwuv_7
    Snippet: In the quasispecies model, the virus organism is represented by the "cloud" of genotypes that can 39 be maintained by the virus within the allowable fitness parameters 12 . In the method proposed 40 here, the frame of reference for the quasispecies "cloud" is reduced to the level of K-length 41 motifs. In order to estimate the connectedness of these K-mers across the input assemblies, a 42 distance matrix between all of the unique K-mers observed.....
    Document: In the quasispecies model, the virus organism is represented by the "cloud" of genotypes that can 39 be maintained by the virus within the allowable fitness parameters 12 . In the method proposed 40 here, the frame of reference for the quasispecies "cloud" is reduced to the level of K-length 41 motifs. In order to estimate the connectedness of these K-mers across the input assemblies, a 42 distance matrix between all of the unique K-mers observed across the designated virus genome 43 assemblies is established using hamming distance. Hierarchical clustering is then performed on 44 the resulting distance matrix using an average linkage function, corresponding to the ultrametric 45 assumption used in Unweighted Pair Group Method with Arithmetic Mean (UPGMA) 46 phylogenies, and flat clusters are extracted using a hyperparameter for the distance cutoff of 47 cluster membership. The constituents of these clusters are then aligned and their positional 48 variants represented using the International Union of Pure and Applied Chemistry (IUPAC) 49 nucleic acid notation with degenerate base symbols. These degenerate motifs are mapped back to 50 their respective assemblies. This approach facilitates interpretation of model features in a 51 functional profiling and hypothesis generating context. To demonstrate the effectiveness of this 52 new feature extraction technique, genotype-to-phenotype linear models were trained on various 53 RNA virus groups. A description of the Python implementation of the algorithm is detailed in 54

    Search related documents: