Author: Markus Luczak-Roesch
Title: Networks of information token recurrences derived from genomic sequences may reveal hidden patterns in epidemic outbreaks: A case study of the 2019-nCoV coronavirus. Document date: 2020_2_11
ID: kevrp8rg_19
Snippet: We first perform a cluster analysis using the random walk apporach by Blondel et al. [3] on the weighted network that we can construct from our original TIC network by collapsing all edges between the same vertices to unique edges weighted by the sum of the collapsed edges (see Figure 2 for a schematic example and Figure 4 for the actual networks we construct following this approach). Afterwards, we analyse visually the network using the open sou.....
Document: We first perform a cluster analysis using the random walk apporach by Blondel et al. [3] on the weighted network that we can construct from our original TIC network by collapsing all edges between the same vertices to unique edges weighted by the sum of the collapsed edges (see Figure 2 for a schematic example and Figure 4 for the actual networks we construct following this approach). Afterwards, we analyse visually the network using the open source software Gephi. In Gephi we scale vertices by their degree and edges by their weight. Furthermore, we colour vertices by their cluster membership. We then run the Yifan Hu [12] layout algorithm (optimal distance: 10,000; relative strength: 0.2; initial step size: 20; step ratio: 0.95; quadtree max level: 50; theta: 0.8; convergence threshold: 1 * 10 −4 ; adaptive cooling: enabled) to visualise the network. We repeat the exact same process to construct and visualise three further networks (cf. Cluster evaluation In order to evaluate the meaningfulness of the TIC network against some baseline, we computer the intra-cluster and inter-cluster similarity of the raw nucleotide sequences. This is done by computing all pairwise comparisons of sequences within individual clusters (intra-cluster similarity) and comparison between all sequences from a cluster with all sequences that are not within that cluster (inter-cluster similarity) using the compareStrings function provided by the Biostrings R package [20] . The similarity of two sequences is then simply defined as follows:
Search related documents:
Co phrase search for related documents- approach follow and cluster evaluation: 1
Co phrase search for related documents, hyperlinks ordered by date