Author: Carla Mavian; Simone Marini; Mattia Prosperi; Marco Salemi
Title: A snapshot of SARS-CoV-2 genome availability up to 30th March, 2020 and its implications Document date: 2020_4_5
ID: 8vl0okiv_4
Snippet: Before carrying out any phylogeny-based analysis of virus evolution and spatiotemporal spread, it is crucial to test the quality of sequence data, since uneven sampling, presence of phylogenetic noise, and absence of temporal signal can affect reliability of the results (e.g. ancestral state reconstructions, molecular clock calibrations) (8) . SARS-CoV-2 full genome sequences were obatained from GISAID (https://www.gisaid.org/) (9) at different t.....
Document: Before carrying out any phylogeny-based analysis of virus evolution and spatiotemporal spread, it is crucial to test the quality of sequence data, since uneven sampling, presence of phylogenetic noise, and absence of temporal signal can affect reliability of the results (e.g. ancestral state reconstructions, molecular clock calibrations) (8) . SARS-CoV-2 full genome sequences were obatained from GISAID (https://www.gisaid.org/) (9) at different timepoints. As of March 30 th , we compared the number of full genomes sampled per country with the number of confirmed cases at the time of sampling, as well as the country's total population (Figure 1 ). We obtained 2608 full genomes from 55 countries ( Figure 1 ). During the past month, the number of genomes has correlation between confirmed cases and genomes per country to be 0.49 on March 30 th , and we considered it as a proxy for sampling homogeneity. However, correlation could only be investigated with confirmed cases (again as proxy), since not all affected countries have made publicly available the total number of coronavirus testing performed. Moreover, even within the same country, sequenced genomes were usually sampled from few hotspots, not necessarily representative of the whole epidemic in that country. It is worrisome that, as of March 30 th 2020, the two top countries in terms of confirmed cases do not show sufficiently large and representative sampling. SARS-CoV-2 full genome sequences available from patients in the US, the country with the highest number of confirmed cases, have mainly been sampled in Washington state (66%) during the early epidemic, while less than one third (32%) are available from the epicenter of the US epidemic, the state of New York. Italy, the second country per confirmed cases, uploaded 26 genomes, of which one from the Marche region, four from Friuli Venezia Giulia, seven from Abruzzo, nine from Lazio, and only five from Lombardy, which is epicenter of the Italian epidemic (Table S1)
Search related documents:
Co phrase search for related documents- confirm case and early epidemic: 1
- coronavirus testing and early epidemic: 1, 2
- country epidemic and early epidemic: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
Co phrase search for related documents, hyperlinks ordered by date