Author: Rahman, Mohammad Shaminur; Islam, Mohammad Rafiul; Hoque, Mohammad Nazmul; Alam, Abu Sayed Mohammad Rubayet Ul; Akther, Masuda; Puspo, Joynob Akter; Akter, Salma; Anwar, Azraf; Sultana, Munawar; Hossain, Mohammad Anwar
Title: Comprehensive annotations of the mutational spectra of SARSâ€CoVâ€2 spike protein: a fast and accurate pipeline Cord-id: occgc3v2 Document date: 2020_10_6
ID: occgc3v2
Snippet: Infecting millions of people, the SARSâ€CoVâ€2 is evolving at an unprecedented rate, demanding advanced and specified analytic pipeline to capture the mutational spectra. In order to explore mutations and deletions in the spike (S) protein — the mostâ€discussed protein of SARSâ€CoVâ€2 — we comprehensively analyzed 35,750 complete S proteinâ€coding sequences through a custom Pythonâ€based pipeline. This GISAIDâ€collected dataset of until 24 June 2020 covered six continents and five ma
Document: Infecting millions of people, the SARSâ€CoVâ€2 is evolving at an unprecedented rate, demanding advanced and specified analytic pipeline to capture the mutational spectra. In order to explore mutations and deletions in the spike (S) protein — the mostâ€discussed protein of SARSâ€CoVâ€2 — we comprehensively analyzed 35,750 complete S proteinâ€coding sequences through a custom Pythonâ€based pipeline. This GISAIDâ€collected dataset of until 24 June 2020 covered six continents and five major climate zones. We identified 27,801 (77.77% sequences) mutated strains compared to reference Wuhanâ€Huâ€1 wherein 84.40% of these strains mutated by only a single amino acid (aa). An outlier strain (EPI_ISL_463893) from Bosnia and Herzegovina possessed six aa substitutions. We also identified 11 residues with high aa mutation frequency, and each contains four types of aa variations. The infamous D614G variant has spread worldwide with everâ€rising dominance and across regions with different climatic conditions alongside L5F and D936Y mutants, which have been documented throughout all regions and climate zones, respectively. We also found 988 unique aa substitutions spanned across 660 residues, which differed significantly among different continents (p = .003) and climatic zones (p = .021) as inferred with the Kruskal–Wallis test. Besides, 17 inâ€frame deletions at four sites adjacent to receptorâ€bindingâ€domain were determined that may have a possible impact on attenuation. This study provides a fast and accurate pipeline for identifying mutations and deletions from the large dataset for coding and also nonâ€coding sequences as evidenced by the representative analysis on existing S protein data. By using separate multiâ€sequence alignment, removing ambiguous sequences and inâ€frame stop codons, and utilizing pairwise alignment, this method can derive both synonymous and nonâ€synonymous mutations (strain_ID reference aa:mutation position:strain aa). We suggest that the pipeline will aid in the evolutionary surveillance of any SARSâ€CoVâ€2 encoded proteins and will prove to be crucial in tracking the everâ€increasing variation of many other divergent RNA viruses in the future. The code is available at https://github.com/SShaminur/Mutation-Analysis.
Search related documents:
Co phrase search for related documents- aa deletion and aa variation: 1
- aa deletion position and aa position: 1, 2
- aa position and aa substitution: 1
- aa position and aa variation: 1
Co phrase search for related documents, hyperlinks ordered by date