Genomics, the study of the structure and function of an organism's DNA, relies heavily on computational power.
It is known that the specific genetic makeup of an individual is hugely significant in influencing a wide range of characteristics, such as susceptibility to disease. There are several traits in humans and animals that are known to be controlled by a single gene, but the vast majority are so-called “complex traits” for which the situation is much more complicated. These are determined by large numbers of genetic and environmental factors as well as their interactions: many genes contribute across the whole genome. Identifying the contributing genes and quantifying their effects in the context of one or multiple environments is of key importance in understanding a specific trait. If we have data containing genetic information together with that on trait occurrence for many individuals, we can perform statistical analysis to identify and quantify the genetic contribution, and in turn understand the risk of susceptibility for a specific individual based on their genome. However this analysis is very computationally demanding.
EPCC has teamed up with Dr Albert Tenesa at the Roslin Institute at the University of Edinburgh to restructure such statistical algorithms to be much more efficient on modern computational resources. The work is focused on optimisation, parallelisation and restructuring of the “Genome-Wide Complex Trait Analysis” (GCTA) open-source software package. This includes performing the adaptations necessary to allow the use of Graphics Processing Units (GPUs).