Author: David A Wilkinson; Lea Joffrin; Camille Lebarbenchon; Patrick Mavingui
Title: Partial RdRp sequences offer a robust method for Coronavirus subgenus classification Document date: 2020_3_6
ID: jjraqr85_6
Snippet: 89 Sequence data, curation and alignment: 90 Sequence data was obtained from the NCBI nucleotide database on the 5 th of July 2019, using the 91 search term "coronavir*". This resulted in the identification of 30,249 sequences. A preliminary set 92 of representative partial RdRp sequences was compiled with reference to recent publications 93 describing Coronavirus diversity across the Orthocoronavirinae subfamily (21), in order to include 94 star.....
Document: 89 Sequence data, curation and alignment: 90 Sequence data was obtained from the NCBI nucleotide database on the 5 th of July 2019, using the 91 search term "coronavir*". This resulted in the identification of 30,249 sequences. A preliminary set 92 of representative partial RdRp sequences was compiled with reference to recent publications 93 describing Coronavirus diversity across the Orthocoronavirinae subfamily (21), in order to include 94 starting reference sequences from with the largest possible diversity of coronaviruses. This 95 preliminary list was then used to identify partial RdRp sequences from retrieved NCBI records by 96 annotating regions that had at least 70 % identity to any reference sequence in the Geneious 97 software package (version 9.4.1). Annotated regions and 200 bp of flanking sequence data were 98 then extracted. Data containing incomplete sequences in the form of strings of N's or significant 99 numbers of ambiguities (>5) were removed. Open reading frames with a minimum length of 300 bp 100 were identified and extracted from the remaining sequences. In the case where the correct reading 101 frame was ambiguous, pairwise alignment to reference sequence data was used to determine 102 reading frame. Remaining sequences were then aligned in-frame using MAFFT, and the resulting 103 alignment was further curated by visual inspection. Retained sequences were then trimmed to 104 include only the most-frequently sequenced partial region of RdRp and so that each sequence 105 contained a minimum of 300 gap free bases. The final alignment was 387 bp in length with 7,544 106 individual sequences, of which 3,155 were unique. The relevant 387 bp region corresponds to 107 nucleotide positions 15287:15673 in Merbecovirus holotype reference sequence JX869059.2. 108 author/funder. All rights reserved. No reuse allowed without permission.
Search related documents:
Co phrase search for related documents- Coronavirus diversity and holotype reference: 1
Co phrase search for related documents, hyperlinks ordered by date