Benchmarking MPI implementations on ARM

Author: Nick Brown
Posted: 30 Aug 2019 | 11:32

The recent installation of Fulhame, the ARM HPC machine based here in EPCC as part of the Catalyst UK programme, raises plenty of interesting opportunities for exploring the HPC software ecosystem for ARM. One such aspect is the relative performance of different MPI implementations on these machines and this is what I was talking about last week at the MVAPICH User Group (MUG) workshop.

There are three implementations of MPI installed on Fulhame: MVAPICH2, OpenMPI, and HPE's MPT. I used four common HPC applications to explore the relative performance properties of each of them, along with the OSU benchmarks to try to explain some of the reasons for the performance behaviour that we saw. 

We have quite a few HPC machines here at EPCC, so in addition to Fulhame I also compared MPI implementations on Cirrus (MVAPIC2 vs HPE's MPT) and ARCHER (Cray's MPICH), and it was really interesting to see the differences in relative performance between these technologies on ARM vs x86, as well as how the different technologies and interconnects faired in the OSU benchmarks. Instead of describing the results in detail here, I refer you to the video of my talk: 

The OSU team is in the process of gaining access to Fulhame so that MVAPICH can be tuned on this system and ARM in general. It is clear that MVAPICH is a really good choice, and the use of ARM in HPC is a very exciting prospect. Throughout my work with this system I found that not only did it work well with our codes, but also provided a stable, and rather mature, HPC ecosystem. As such I suspect we will see significant adoption of ARM-based systems (likely with MVAPICH installed!) across the Top500 in the next few years.

Video on A Performance Comparison of Different MPI Implementations on an ARM HPC System
Image: Fulhame system by Craig Manzi.


Nick Brown, EPCC

Blog Archive