Preparing programming models for Exascale

Author: Daniel Holmes
Posted: 18 Jun 2019 | 15:03

To make future heterogeneous systems easier to use efficiently, the EPiGRAM-HS project is improving and extending existing programming models with Exascale potential. We are working with the MPI and GASPI programming models primarily, but also applying our changes to HPC applications like Nek5000, OpenIFS and iPIC3D, and AI frameworks like TensorFlow and Caffe. We expect the trend towards specialisation of hardware will continue and therefore large machines will become more and more heterogeneous.

EPCC is leading the work on heterogeneous network challenges. Having driven the effort to get persistent collective operations into the MPI Standard, we are now focused on exploring the full potential of that interface. 

The HPC applications in EPiGRAM-HS need the halo-exchange and global reduction communication patterns. We will be working within the open-source Open MPI library to optimise and streamline the implementation of key communication operations: persistent neighbourhood collectives (for halo-exchange) and persistent all-reduce (for global reduction).

The additional “planning” step in this new interface offers an opportunity to remove a lot of the middleware code inside MPI and thereby shorten the critical path for starting and completing communication. In addition, we will look for ways to allocate dedicated resources to improve performance even further, and ways to offload the remaining software overhead into smart hardware (eg FPGAs or programmable network devices) whenever possible.

Beyond this implementation effort, EPCC will also research and develop extensions to communication interfaces that overcome more of the limitations of existing MPI. Specifically, we are working on partitioned communication to provide more flexible notification mechanisms targeting multi-core and many-core hybrid programming, message channels to bring the benefit of persistence to point-to-point communication, and message streams to address the problem of variable-length messages.

EPCC will also contribute to the other aspects of EPiGRAM-HS: heterogeneous memory and compute, integration with applications, and engagement with standardisation bodies. We are working with our project partners on ideas like running MPI natively on GPGPU devices to improve communication performance, auto-tuning data placement, layout, and movement using performance modelling combined with compiler and runtime adjustments, as well as offloading computation to GPGPUs, FPGAs, and into smart in-network devices.

The next generations of supercomputers will be much more complex and specialised than today’s machines, but we will make sure our programming models are ready for the new challenge!

Project website:

EPiGRAM-HS project partners are Cray UK Ltd, ECMWF, ETH Zurich, Fraunhofer ITWM, and KTH (project lead).