SPRINT: Parallel statistics for genetic research

Gene analysis is becoming increasingly complex and can be greatly enhanced by exploiting the power of high-performance computing. EPCC and the Division of Pathway Medicine at the University of Edinburgh developed a prototype framework called SPRINT, which allows biostatisticians to more easily exploit HPC systems.

SPRINT (Simple Parallel R INTerface) is an easy-to-use parallel version of R, a statistical language that processes the data gleaned from microarray analysis, a technique which allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples.

Processing the data that is produced by microarray analysis tests the limits of existing bioinformatics computing infrastructure. A solution is to use HPC systems, which offer more processors and memory than desktop computer systems. However, R must be able to utilise multiple processors if it is to fully exploit the power of HPC systems to analyse genomic data. There are existing modules that enable R to do this, but they are either difficult for HPC novices or cannot be used to solve certain classes of problem. SPRINT allows parallelised functions to be added to R without the need to master parallel programming methods, enabling the easy exploitation of HPC systems.

SPRINT has been ported to ARCHER, the UK national supercomputing service. The code had been analysed and optimised to enable it to scale to 512 slave processes and beyond.

Find out more

SPRINT project website

To run SPRINT on ARCHER, contact the ARCHER Helpdesk or the SPRINT team.

Read about SPRINT on the EPCC blog.