Making complex machines easier to use efficiently

24 October 2018

Supercomputers are getting more complex. Faster components would be impossible to cool but, by doing more with less, we can still solve bigger problems faster than ever before.

Hardware designers are trying innovative designs and novel hardware options such as the new tensor unit in Volta GPUs, compute-in-network capabilities, and several new technologies for memory – HBM, NVM, storage class memory and others. Future supercomputers will combine all of these specialised hardware components to create general-purpose computing resources, but how much of this complexity should be exposed to, and controlled by, programmers? What will that new functionality look like? How can we get high performance of applications and high efficiency of resource usage?

The vision of the EPiGRAM-HS project is to enable extreme scale applications on heterogeneous hardware. This means figuring out how to use new hardware capabilities and how to combine different components to get the best result. Together we will look at the challenge from four directions: network, memory, compute, and applications.

EPCC will focus on exploiting heterogeneity for high performance communication, building on proven programming models. We will use the newly standardised interface for persistent collective operations in MPI to implement efficient high-level communication patterns. Here the aim is to hide as much of the hardware complexity as possible and instead give the user access to a high-level abstraction. The goal is to give high performance but avoid requiring the user to know about, or deal with, the intimate details of each piece of novel hardware in each machine. Other partners will investigate how MPI and GPI can be used directly on GPUs and FPGAs.

In collaboration with other partners, EPCC will also help to define suitable abstractions for memory usage, and create a unified interface applicable to all memory and storage devices. The intention is that this will make programming easier because code will be more portable, even between hardware devices and components with vastly different capabilities.

A further technical challenge in the project is how to integrate novel compute hardware such as FPGAs. This is partly a scheduling problem: on which compute device should each piece of code execute? And partly it is an API design choice: how should novel compute capability be exposed to programmers?

All of the work at the programming model level and on prototype implementations will be validated using real applications. The partners have expertise in traditional HPC applications like Nek5000, iPIC3D, and IFS, and also in data science applications like lung cancer detection using TensorFlow. The project will also push for changes to international standards to support heterogeneous systems.

To stay up to date with the project, please subscribe to our quarterly newsletter by visiting our website: https://epigram-hs.eu

The EPiGRAM-HS project is part of the EU’s Horizon 2020. The project partners are: KTH, ETH, EPCC, Fraunhofer, Cray, and ECMWF.