Author: Ramya Rangan; Ivan N. Zheludev; Rhiju Das
Title: RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses Document date: 2020_3_28
ID: kjeqdse5_10
Snippet: We used two approaches to make predictions for conserved structured regions in SARS-CoV-2. First, we predicted RNA structures centered on the most sequence-conserved regions of SARSrelated betacoronavirus genomes (alignment SARSr-MSA-1). For each conserved stretch (at least 15 nucleotides long, 100% sequence conservation) along with 20 nucleotide flanking windows, we predicted maximum expected accuracy (MEA) secondary structures using Contrafold .....
Document: We used two approaches to make predictions for conserved structured regions in SARS-CoV-2. First, we predicted RNA structures centered on the most sequence-conserved regions of SARSrelated betacoronavirus genomes (alignment SARSr-MSA-1). For each conserved stretch (at least 15 nucleotides long, 100% sequence conservation) along with 20 nucleotide flanking windows, we predicted maximum expected accuracy (MEA) secondary structures using Contrafold 2.0. 18 We then sought to rank sequences based on the predicted probability that the RNA folds into the MEA structure and not other structures. For this ranking, we used the estimated Matthews correlation coefficient (MCC) from each construct's base-pairing probability matrix. 19 We note here that while MCC is often used in the RNA structure modeling literature to assess agreement of a prediction with a reference structure, we here use the metric to assess how tightly concentrated the ensemble of predicted secondary structures is to a single predicted secondary structure, the MEA structure. An MEA structure with a higher estimated MCC is expected to have unpaired and paired bases that better align with the construct's predicted ensemble base-pairing probabilities, lending support to the single-structure MEA prediction. In Fig. 2 20 We also sought independent methods to identify thermodynamically stable and conserved RNA structures, without initially guiding the search to focus on extremely sequence-conserved genome regions. We made predictions for structured regions using RNAz 21 , beginning with the betacoronavirus alignment SARSr-MSA-1. RNAz predicts structured regions that are more thermodynamically stable than expected by comparison to random sequences of the same length and sequence composition (z-score), and additionally assesses regions by the support of compensatory and consistent mutations in the sequence alignment (SCI score). These two criteria are combined into a single P-score, which when tested empirically on a set of ncRNAs produced a false-positive rate of 4% at a P>0.5 cutoff and 1% at a P>0.9 cutoff. To predict structured regions across the full viral genome, we scanned the SARSr-MSA-1 alignment in windows of length 120 nucleotides sliding by 40 nucleotides, predicted all RNAz hits in the plus strand at a P>0.5 cutoff, clustered the resulting hits to generate maximally contiguous loci of the genome with predicted structure, and filtered results to only include loci with at least one window with a P>0.9 structure prediction.
Search related documents:
Co phrase search for related documents, hyperlinks ordered by date