Poster Presentation Australasian RNA Biology and Biotechnology Association 2024 Conference

Design and optimization of messenger mRNA with large language models for enhanced protein translation. (#110)

Sai Surya Teja Bhogaraju 1 2 3 , Ziwei Liu 1 2 3 , Denis Bauer 4 5 , Laurence Wilson 4 5 , Eduardo Eyras 1 2 3
  1. The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, ACT, Canberra, Australia
  2. The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, ACT, Canberra, Australia
  3. EMBL Australia Partner Laboratory Network at the Australian National University, Australian National University, ACT, Canberra, Australia
  4. Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO),, CSIRO, NSW, Sydney, Australia
  5. Applied Biosciences, Faculty of Science and Engineering,, Macquarie University, NSW, Sydney, Australia

Optimising mRNA sequences to achieve stability and high-level expression of the encoded protein is a key challenge in the generation of mRNA therapies. This is traditionally performed through codon optimization by modifying synonymous codons according to the host organism's specific codon usage bias and tRNA abundances. However, this approach does not take into account the full complexity of the multiple codon combinations and does not consider key properties of the target protein, such as abundance and function. Here we propose leveraging Large Language Models and generative AI to address these challenges and accomplish mRNA optimisation for high protein expression more effectively. Our approach comprehensively explores the context of all codons in the mRNA reference within the embedding space. During this exploration, our approach performs two simultaneous optimizations to improve the translation output and to minimise the structural similarity with the target protein.

The translation output is a regressor model, where the input is the embedding of the mRNA sequence and the output is the protein abundance. By optimising the protein abundance rather than the translation efficiency based on ribosome occupancy, we focus on a more relevant metric for the efficiency of the mRNA therapy. The structural similarity is estimated using a deep learning approach that compares the structures of the target protein and the protein encoded by the generated mRNA. This optimization ensures that the target function is preserved while allowing a broader testing of mRNA sequences. This dual optimization strategy allows for a more comprehensive exploration of the sequence space, considering both codon usage and the function of the encoded protein. By integrating these advanced computational techniques, our method promises to enhance the design of mRNA sequences for therapeutic applications, potentially improving the efficacy and stability of mRNA-based treatments.