Author: Zhang, He; Zhang, Liang; Lin, Ang; Xu, Congcong; Li, Ziyu; Liu, Kaibo; Liu, Boxiang; Ma, Xiaopin; Zhao, Fanfan; Yao, Weiguo; Li, Hangwen; Mathews, David H.; Zhang, Yujian; Huang, Liang
Title: LinearDesign: Efficient Algorithms for Optimized mRNA Sequence Design Cord-id: v0m90h3n Document date: 2020_4_21
ID: v0m90h3n
Snippet: Messenger RNA (mRNA) vaccines have been successful for COVID-19, but still suffer from the critical issue of chemical instability and degradation of the mRNA molecule, which is a major obstacle in the storage, distribution, and efficiency of the vaccine. Previous work established a correlation between its chemical stability and thermodynamic folding stability, and longer half-life also leads to greater protein expression. Therefore, we aim to design mRNAs with optimal folding stability for more
Document: Messenger RNA (mRNA) vaccines have been successful for COVID-19, but still suffer from the critical issue of chemical instability and degradation of the mRNA molecule, which is a major obstacle in the storage, distribution, and efficiency of the vaccine. Previous work established a correlation between its chemical stability and thermodynamic folding stability, and longer half-life also leads to greater protein expression. Therefore, we aim to design mRNAs with optimal folding stability for more efficient mRNA vaccines. However, due to combinatorial explosion because of synonymous codons, the mRNA design space is prohibitively large, e.g., there are ~$2.4 \times 10^{632}$ valid mRNAs for the SARS-CoV-2 Spike protein, while common practice such as codon optimization can only explore candidates far from optimal folding stability. Here we provide a surprisingly simple solution to this hard problem by reducing it to a classical problem in theoretical computer science and computational linguistics. We formulate the mRNA design space as a finite-state automaton, and the optimal mRNA can be found via lattice parsing on that automaton with a context-free grammar encoding the energy model. This reduction enables a cubic-time algorithm that scales quadratically for practical applications, taking only 11 minutes for the Spike protein, and can also extend to jointly optimize stability and translation efficiency. We also develop a beam search variant that runs in linear time and provides suboptimal designs for experimentation. Finally, compared to the codon-optimized benchmark, our designs substantially improve chemical stability and protein expression in vitro, and dramatically increase neutralizing antibody titers by up to 20x in vivo. Our algorithm makes it possible to explore the previously inaccessible region of high-stability designs, and can also be used in gene therapy and synthetic biology.
Search related documents:
Co phrase search for related documents- adaptation index and long sequence: 1, 2
- long length and loop sequence: 1
- long protein and loop sequence: 1
- long sequence and loop sequence: 1, 2
Co phrase search for related documents, hyperlinks ordered by date