Understanding protein synthesis via analysis of ribosome profiling data

Author: Mike Jackson
Posted: 8 Aug 2019 | 10:51

The molecular structure of a yeast ribosome, composed of 79 proteins

A multi-disciplinary team of biologists, bioinformaticians and research software engineers based at EPCC and The Wallace Lab at University of Edinburgh, The Shah Lab at Rutgers University, and The Lareau Lab at University of California, Berkeley, will enhance and extend a software suite called RiboViz to extract biological insight from "ribosome profiling" data and drive forward understanding of protein synthesis.

All cells make proteins by using molecular machines called ribosomes, which read a messenger RNA template and "translate" the RNA code into the protein code. Signals, also encoded in the RNA, control what proteins are made by cells, when they are made and in what quantities. These signals are complex and only just beginning to be understood because there are thousands of different RNA sequences in a cell and each is hundreds to thousands of nucleotides ("letters") long.

Recent advances in DNA and RNA sequencing technology mean that we can now measure all the subsequences of RNA that are translated into protein and the quantity of protein produced by using a technique called ribosome profiling. Although this technique is impressive it is not perfect, and statistical tools are needed to separate the interesting biological signals in the data from the unwanted biases of the experimental measurement. These tools need to be implemented in usable and reliable software in order for all scientists studying protein synthesis to be able to get the maximum possible information from ribosome profiling data, which is expensive and time-consuming to collect.

The RiboViz software suite, written in Python and R, takes raw data from sequencing machines and passes it through a series of processing steps. RiboViz estimates how much each part of RNA is translated, and how the amount of translation is controlled by the code of that RNA. RiboViz produces tables, figures and graphs that can be published online in a form useful for both experts and non-experts. Sharing data in this way can help to make science both more reproducible and more accessible. In this spirit RiboViz itself is open source software, hosted on GitHub, and free to use by anyone in the world.

The first iteration of RiboViz was developed by Premal Shah and Tongji Xing of Rutgers University, and Oana Carja and Joshua Plotkin of the University of Pennsylvania. Edward Wallace at the Institute of Cell Biology, School of Biological Sciences, University of Edinburgh and Premal have developed successive versions.

For our current project, EPCC's Kostas Kavoussanakis and myself will work with both Edward and Felicity Anderson at the University of Edinburgh, and Premal and Liana Lareau of University of California, Berkeley. We will make the RiboViz code more reliable, easier to use, and future proof, and add features that quantify protein synthesis more accurately. We will develop statistical models that take account of both biological signals and unwanted biases. We will apply these to understand some interesting features of how protein synthesis is regulated. The first is how production of a short ("upstream") protein from an RNA can control production of another protein later ("downstream") on the same RNA. The second is to understand how synonymous parts of the RNA code affect how ribosomes move and how much protein they produce.

Our work will help to develop fundamental knowledge about how cells work, and has several applications. Companies who genetically engineer cells to express proteins, for example to make therapeutic drugs or artificial silk, will have better tools to engineer those cells to produce the right amount of protein at the right time. Scientists studying evolution will have better tools to understand how coding sequences evolve, allowing deeper understanding of the tree of life. Lastly, we will be able to better understand human genetic diseases caused by defects in protein synthesis, which in the long run could lead to better treatments.

Our collaboration is funded by the BBSRC in the UK and the NSF/BIO in the USA as a BBSRC-NSF/BIO Lead Agency collaboration. Essential in developing our proposal to BBSRC and NSF/BIO, was input from The Software Sustainability Institute. The Institute provides advice and guidance to researchers on all aspects of the use, development and funding of software within research. In 2018, the Institute completed a software and sustainability review of RiboViz, which included recommendations as to how the software and its supporting documentation and resources could be improved and a development plan which formed the basis for our proposal.

Our collaboration started in May 2019 and runs until April 2022. We look forward to reporting on our progress.

This work is funded by BBSRC in the UK and the NSF/BIO in the USA as a BBSRC-NSF/BIO Lead Agency collaboration.

Image: Marat Yusupov, Roland Beckmann, and Anthony Schuller, from HHMI.

Authors

Mike Jackson and Kostas Kavoussanakis, EPCC
Edward Wallace, School of Biological Sciences, The University of Edinburgh.