Oncology: Genetic markers for bowel cancer

According to figures from Cancer Research UK, some 35,000 people each year are diagnosed with colorectal cancer (cancer of the bowel and rectum). This makes it one of the most common forms of cancer in the country in both men and women. While the development of effective treatments is clearly important, early identification of patients at risk would be extremely useful in prevention of the cancer.

The Oncology Project is a collaboration between EPCC and the Colon Cancer Genetics Group (CCGG) of the MRC Human Genetics Unit at the Western General Hospital with the aim of investigating the relationship between genetic markers and colorectal cancer. The ultimate goal is to identify individuals at risk of the disease and take appropriate preventative measures.

The researchers at CCGG have access to a unique and extensive dataset consisting of 565,000 genetic markers with real data from 1000 cancer cases and 1000 matched controls. The first phase of the project involved porting a FORTRAN serial code which investigates the effects of each genetic marker individually to the BlueGene computer. Initial estimates of the runtime suggested that the code would take around 10,800 days to run on a standard desktop machine. After optimising and parallelising, the code ran in 6.5 hours on 128 BlueGene/L processors.

The second phase of the project investigated the interactions between pairs of genetic markers, with the final goal being to obtain a ranked list of the pairs which show the greatest interaction. As each of the 565,000 markers must be tested against all other markers this results in a truly vast problem, requiring over 1.6 billion pairs of markers to be tested and ranked. A calculation of this size is simply not feasible to perform on a desktop PC and therefore access to a parallel computer is essential. EPCC devised a complex 2-dimensional decomposition strategy and also parallelised the researcher's code to enable it to run on a large number of processors. We also devised a parallel sorting strategy to produce the final ranked list of marker pairs.

The collaboration took advantage of EPCC's considerable expertise in parallel and high-performance computing, releasing the CCGG researchers to focus on the algorithm for analysis of the results.

For further information visit the Oncology project website