Author: Zhengqiao Zhao; Bahrad A. Sokhansanj; Gail L. Rosen
Title: Characterizing geographical and temporal dynamics of novel coronavirus SARS-CoV-2 using informative subtype markers Document date: 2020_4_9
ID: 9sk11214_3
Snippet: The focus of the error correction method is to correct an ISM that contains ambiguous symbols, i.e., a subject ISM, if the generated nucleotide symbol identifies a smaller set of bases, e.g., Y representing C or T rather than N, which may be any base, we use the generated symbol to correct the original one. At the highest level, we assess the geographic distribution of SARS-CoV-2 subtypes, and, in turn, we count 170 the frequency of unique ISMs p.....
Document: The focus of the error correction method is to correct an ISM that contains ambiguous symbols, i.e., a subject ISM, if the generated nucleotide symbol identifies a smaller set of bases, e.g., Y representing C or T rather than N, which may be any base, we use the generated symbol to correct the original one. At the highest level, we assess the geographic distribution of SARS-CoV-2 subtypes, and, in turn, we count 170 the frequency of unique ISMs per location and build charts and tables to visualize the ISMs, including the 171 pie charts, graphs, and tables shown in this paper. All visualizations in this paper and our pipeline are 172 generated using Matplotlib [12] . To improve visualization, ISMs that occur with frequency of less than 5% in 173 a given location are collapsed into "OTHER" category per location. Our pipeline then creates pie charts for 174 different locations to show the geographical distribution of subtypes. Each subtype is also labeled with the 175 earliest date associated with sequences from a given location in the dataset. 176 To study the progression of SARS-CoV-2 viral subtypes in the time domain, we group all sequences in a given location that were obtained no later than a certain date (as provided in the sequence metadata) together and compute the relative abundance (i.e., frequency) of corresponding subtypes. Any subtypes with a relative abundance that never goes above 2.5% for any date are collapsed into "OTHER" category per location. The following formula illustrates this calculation:
Search related documents:
Co phrase search for related documents, hyperlinks ordered by date