A researcher's perspective on working with the Software Sustainability Institute
Posted: 11 Dec 2020 | 08:50
By Edward Wallace, School of Biological Sciences, University of Edinburgh.
Why I need sustainable software for my research
I run a lab, or research group, in the School of Biological Sciences at Edinburgh. My group is funded by the Wellcome Trust, the Royal Society and BBSRC. We're interested in questions about how cells decide which proteins to make and when. Also, how cells change which proteins they make when they learn something from the environment and need to change what they're doing. In the 21st century we collect some very large datasets to measure this. There are datasets based on sequencing the RNA, which encodes protein, and datasets that measure all the proteins in cells at the same time. These datasets are measuring thousands of different things in many samples, often dozens of samples. Each dataset is gigabytes in size, and so it's quite hard work to dig into them and get the simplest and most relevant answers about what cells are doing.
That's why our work in biology relies on computing and mathematical modelling in order to be able to make sense of cells at all. Actually, the datasets get so big that you can't attack the problem by hand. You can't even just write a little "script" of data analysis code that works on a small dataset. Whatever you do has to work again and again on every piece of a large dataset. Ideally, you want it to work not just on the dataset you're working with today, but also the dataset that you collect next month or that you want to collect next year. Or, you want to check against a dataset that another scientist on the other side of the world collected. So you want to take your analysis and make it "portable", move the analysis and apply it to similar data.
To analyse big data in biology, we have to work with people who really understand computers and software, as well as people who really understand statistics. That is why we are working with the Software Sustainability Institute (SSI). It's been incredibly useful for my group to work with the SSI in lots of ways, including the fact that I am learning how to write better software. I am then able to teach better software practice to people in my lab in the School of Biological Sciences, and to other colleagues.
My project that SSI transformed - RiboViz
The project where we're working most closely with SSI and EPCC is a project to write better software for analysing protein synthesis in cells using big datasets. The project is called RiboViz, and we wrote about this in a previous blogpost. We had a collaboration with a group in America, Premal Shah at Rutgers University, where we'd written a version of the software and applied it to data from one kind of cell - baker's yeast. We came to the SSI, who do code reviews: EPCC's Dr Mike Jackson took a close look at our software and told us what we could do to make it work better now and be easier to maintain in the future.
Then, we used that code review to apply for funding to improve our software. When I say "apply for funding", the thing is that writing software is hard, it takes a long time and that means you need enough money to pay somebody to do it. So we applied to BBSRC - the biological sciences research council in the UK - in collaboration with the USA's National Science Foundation. Our funding application was successful, which means we're now still working very closely with Dr Mike Jackson at EPCC to actually make these improvements to the software. This is a collaboration with labs in America, both Premal Shah and also Liana Lareau at University of California, Berkeley.
We've just published a pre-print about what we've learned while rewriting our software, so other people who are doing these complicated multi-step analyses can learn some of the things that we've learned about how to do what we do better: Using rapid prototyping to choose a bioinformatics workflow management system (Michael J. Jackson, Edward Wallace, Kostas Kavoussanakis).
How I first got involved with the SSI
I first heard about SSI because of a community teaching initiative called Software Carpentry. The Carpentries curate a series of open source lessons and initiatives to teach researchers to use software better in their research, write better code, learn better practices and learn to teach other people. During my postdoctoral training in the United States I'd taken some software carpentry courses and found them really useful. I liked the community ethos. When I moved to Edinburgh I got an email from SSI's Giacomo Peru advertising a Software Carpentry workshop, and I emailed back to say 'Can I help with this?'.
Now I'm one of the people who helps to organise software and data workshops across the university with Edinburgh Carpentries. That's part of giving back to the community: some of us have been lucky enough to learn good practices in our computational research. By working with Carpentries and SSI, I can help other people improve what they do in turn.
How can the SSI help you?
There are lots of ways that the SSI can help people who rely on computation for their research, including people who are at their very first steps of needing to learn how to code in order to attack the next dataset. As well, some of the Carpentries training that SSI organises is a good way into computational analysis skills. Some of the other work in the Bayes Centre (where the SSI is based in Edinburgh) like the organisation of data science courses can be really helpful.
One place where the SSI is particularly useful is if you're at a stage where you have written some code to do an analysis and you know that it needs to be better. For example, your code keeps breaking, or this code is working on your computer, but now it's running too slowly and you need to move it to a bigger computer and you're not sure how to do that. The SSI has experts who can advise you on what to do. They can do code reviews, where they'll actually look over your code and see what can be done to improve it. Then they can help you to write grants and make other practical progress towards improving your code. They can help connect you with other people, at the university or elsewhere, who might be able to help.
The Software Sustainability Institute is an amazing resource for people like me who rely on computation for our work.
EPCC is a founding member of the Software Sustainability Institute.