Selected article for: "genomic sequence and host receptor"

Author: Andrea Vandelli; Michele Monti; Edoardo Milanetti; Riccardo Delli Ponti; Gian Gaetano Tartaglia
Title: Structural analysis of SARS-CoV-2 and prediction of the human interactome
  • Document date: 2020_3_31
  • ID: ewvdl06h_1
    Snippet: To analyze SARS-CoV-2 structure (reference Wuhan strain MN908947.3), we employed CROSS 12 105 to predict the double-and single-stranded content of RNA genomes such as HIV-1 13 . We found the 106 highest density of double-stranded regions in the 5' (nucleotides 1-253), membrane M protein 107 (nucleotides 26523-27191), spike S protein (nucleotides 23000-24000), and nucleocapsid N protein 108 (nucleotides 2874-29533; Fig. 1 ) 26 . The lowest density.....
    Document: To analyze SARS-CoV-2 structure (reference Wuhan strain MN908947.3), we employed CROSS 12 105 to predict the double-and single-stranded content of RNA genomes such as HIV-1 13 . We found the 106 highest density of double-stranded regions in the 5' (nucleotides 1-253), membrane M protein 107 (nucleotides 26523-27191), spike S protein (nucleotides 23000-24000), and nucleocapsid N protein 108 (nucleotides 2874-29533; Fig. 1 ) 26 . The lowest density of double-stranded regions were observed 109 at nucleotides 6000-6250 and 20000-21500 and correspond to the regions between the non-110 structural proteins nsp14 and nsp15 and the upstream region of the spike surface protein S (Fig. 1) 111 26 . In addition to the maximum corresponding to nucleotides 23000-24000, the structural content of 112 spike S protein shows minima at around nucleotides 20500 and 24500 (Fig. 1) . 113 We used the Vienna method 27 to further investigate the RNA secondary structure of specific 114 regions identified with CROSS 13 . Employing a 100 nucleotide window centered around CROSS 115 maxima and minima, we found good match between CROSS scores and Vienna free energies (Fig. 116 1). Strong agreement is also observed between CROSS and Vienna positional entropy, indicating 117 that regions with the highest structural content have also the lowest structural diversity. 118 119 Our analysis suggests the presence of structural elements in SARS-CoV-2 that have evolved to 120 interact with specific human proteins 11 . Our observation is based on the assumption that structured 121 regions have an intrinsic propensity to recruit proteins 14 , which is supported by the fact that 122 structured transcripts act as scaffolds for protein assembly 15, 16 . Using Clustal W for multiple sequence alignments 28 , we observed general conservation of the 155 coding regions (Fig. 3A) . The 5' and 3' show high variability due to experimental procedures of the 156 sequencing and are discarded in this analysis 29 . One highly conserved region is between 157 nucleotides 23000 -24000 in the spike S genomic locus, while sequences up-and downstream are 158 variable (red bars in Fig. 3A) . We then used CROSSalign 13 to compare the structural content 159 (Materials and Methods). High variability of structure is observed for both the 5' and 3' and for 160 nucleotides between 21000 -22000 as well as 24000 -25000, associated with the S region (red bars 161 in Fig. 3A) . The rest of the regions are significantly conserved at a structural level (p-value < 162 0.0001; Fisher's test). 163 164 We then compared protein sequences coded by the spike S genomic locus (NCBI reference 165 QHD43416) and found that both sequence (Fig. 3A) and structure (Fig. 2) of nucleotides 23000 -166 24000 are highly conserved. The region corresponds to amino acids 330-500 that contact the host 167 receptor angiotensin-converting enzyme 2 (ACE2) 30 promoting infection and provoking lung injury 168 24,31 . By contrast, the region upstream of the binding site receptor ACE2 and located in 169 correspondence to the minimum of the structural profile at around nucleotides 22500-23000 ( Fig. 1 ) 170 author/funder. All rights reserved. No reuse allowed without permission.

    Search related documents:
    Co phrase search for related documents
    • amino acid and bind site receptor: 1, 2
    • amino acid and code region: 1, 2, 3, 4
    • amino acid and CROSS score: 1
    • amino acid and double strand: 1, 2
    • amino acid and downstream sequence: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
    • amino acid and Fisher test: 1, 2
    • amino acid and free energy: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
    • amino acid and genomic locus: 1
    • amino acid and good match: 1, 2
    • bind site and experimental procedure: 1, 2
    • bind site and free energy: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
    • bind site and good match: 1
    • double strand and downstream sequence: 1
    • double strand and genomic locus: 1
    • downstream sequence and free energy: 1
    • free energy and good match: 1, 2