Greenhouse gases and GPUs

Author: Iain Bethune
Posted: 17 Dec 2014 | 16:32

We have just reached the end of a short project collaborating with Atmospheric Geochemists at the universities of Edinburgh and Bristol. After they purchased two machines each, both with dual Intel Xeon Ivy-bridge 12-core CPUs and NVIDIA Tesla K20x GPUs, EPCC was tasked to investigate the feasability of using the GPUs to improve the performance of their software.

First we spoke with Dr Matt Rigby and Dr Anita Ganesan from Bristol. They have developed a Fortran code that implements a Hierarchical Markov Chain Monte Carlo model of Greenhouse gas emissions, essentially a probabilistic approach to solving the 'inverse problem' of inferring a set of parameters which describe emission processes from a set of trace atmospheric gas measurements over time at local or regional scale. In common with all Monte Carlo methods, the algorithm alternates between a proposal, where a random change to one of the state variable is made, and then a probabilistic accept or reject, depending on the calculated likelihood of the change.

At first glance this might not seem easily amenable to parallelism, since every step must happen in strict sequence. However, there is actually significant computation involved in calculating the acceptance probability of a change, involving matrix and vector operations with 1000s of elements, which had potential for GPU acceleration. In order to use CUDA in tandem with Fortran, without being tied to the PGI compiler's CUDA Fortan language extensions, I refactored the code to allow a simple C layer to be interposed between the main Fortran code, and CUDA kernels. In the end, the refactoring and associated optimisation resulted in speedups of 2.8x even on the CPU, increasing to 3.3x when the GPU was used, despite only porting one of the six main computational kernels to CUDA. There is certainly scope for further gains if the rest of the code is modified to use CUDA, not least because the need for data to be moved back and forth from CPU to GPU at every iteration will be reduced. 

It was really fun to spend some time working with CUDA in the context of a real code, and also a good learning experience to make use of the CuBLAS library along with standard CUDA C kernels.  We intend to continue the development of the GPU port via an MSc in HPC student project during the summer of 2015.

GEOS-Chem: modelling global 3D chemical transport

In the second part of the project, we investigated the GEOS-Chem code, used by Prof. Paul Palmer's group in Edinburgh. GEOS-Chem is a large and complex program which implements a global 3D chemical transport model, with a large user and developer group worldwide. It models the dynamics and chemical processes in the atmosphere, and as well as interactions with land and ocean processes. Although we only got access to the code and a test case near the end of the project, we were able to carry out detailed profiling of the code, which is currently parallelised using OpenMP. The key routines for performance have been identified, and we made recommendations for how these could be ported to GPU. However, unlike the first code, GEOS-Chem has quite a flat profile, and to so gain the full benefit of GPUs a large amount of code would need to be ported - not an easy task!

Images: (Top) Sulphur Hexaflouride emissions for selected regions, from Ganesan et al, Atmos. Chem. Phys., 2014. (Above) Snapshot from GEOS-Chem model, from www.palmergroup.org