MPI

EPCC’s ARM system: comparing the performance of MPI implementations

Author: Nick Brown
Posted: 9 Dec 2019 | 12:48

MVAPICH is a high performance implementation of MPI. It is specialised for InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE communication technologies, but people generally use the default module loaded on their system. This is important because, as HPC programmers, we often optimise our codes but overlook the potential performance gains of better choice of MPI implementation.

Multi-network MPI on Intel Omni-Path

Author: Adrian Jackson
Posted: 17 Jul 2019 | 14:11

Networks

As part of the NEXTGenIO project we have a prototype HPC system that has two Intel Omni-Path networks attached to each node. The aim of having a dual-rail network setup for that system is to investigate the performance and functionality benefits of having separate networks for MPI communications and for I/O storage communications, either directing Lustre traffic and MPI traffic over separate networks, or using a separate network to access NVDIMMs over RDMA. We were also interested in the performance benefits for general applications exploiting multiple networks for MPI traffic, if and where possible.

What is MPI “nonblocking” for? Correctness and performance

Author: Daniel Holmes
Posted: 27 Feb 2019 | 15:53

The MPI Standard states that nonblocking communication operations can be used to “improve performance… by overlapping communication with computation”. This is an important performance optimisation in many parallel programs, especially when scaling up to large systems with lots of inter-process communication.

However, nonblocking operations can also help with making a code correct – without introducing additional dependencies that can degrade performance.

March 2018 meeting of the MPI Forum

Author: Daniel Holmes
Posted: 21 Apr 2018 | 16:21

In the March 2018 meeting of the MPI Forum, the “Persistent Collectives” proposal began the formal ratification procedure, the “Sessions” proposal took a step forward, but the “Fault Tolerance” saga took a step side-ways.

The proposal to add persistent collective operations to MPI was formally read at the March meeting, and was well-received by all those present. The first vote for this proposal will happen in June and the second vote in September. If all goes well, this addition to MPI will be announced at SC18.

Planning for high performance in MPI

Author: Daniel Holmes
Posted: 25 Jan 2018 | 14:36

Many HPC applications contain some sort of iterative algorithm and so do the same steps repeatedly, over and over again, with the data gradually converging to a stable solution. There are examples of this archetype in structural engineering, fluid flow, and all manner of other physical simulation codes.

The Message Passing Interface: On the Road to MPI 4.0 and Beyond (SC17 event)

Author: Daniel Holmes
Posted: 8 Nov 2017 | 10:23

This year’s MPI Birds-of-a-Feather meeting at SC17 will be held on Wednesday 15th November. I’ll be talking about the Sessions proposal – and explaining why it’s no longer called Sessions!

Spoiler: the working group has been looking at how Teams might interact with Endpoints.

Apple vs oranges: performance comparisons

Author: Adrian Jackson
Posted: 11 Apr 2017 | 17:59

Shall I compare thee...

Performance comparisons are always tricky to get exactly right. They are needed to ensure that we can demonstrate the performance improvements that optimisations, new hardware, new algorithms, etc... have had on an application or benchmark, but there is a lot of latitude in what can be compared, which makes it easy to get a performance comparison wrong and not properly demonstrate whatever it is you're trying to show.

MPI performance on KNL

Author: Adrian Jackson
Posted: 30 Aug 2016 | 12:22

Knights Landing MPI performance

Following on from our recent post on early experiences with KNL performance, we have been looking at MPI performance on Intel's latest many-core processor.

MPI ping-pong latency on KNC and IvyBridge
Figure 1

The MPI performance on the first generation of Xeon Phi processor (KNC) was one of the reasons that some of the applications we ported to KNC had poor performance.  Figures 1 and 2 show the latency and bandwidth of an MPI ping-pong benchmark running on a single KNC and on a 2x8-core IvyBridge node.

Early experiences with KNL

Author: Adrian Jackson
Posted: 29 Jul 2016 | 16:45

Initial experiences on early KNL

Updated 1st August 2016 to add a sentence describing the MPI configurations of the benchmarks run.
Updated 30th August 2016 to add CASTEP performance numbers on Broadwell with some discussion

EPCC was lucky enough to be allowed access to Intel's early KNL (Knights Landing, Intel's new Xeon Phi processor) cluster, through our IPCC project.  KNL Processor Die

KNL is a many-core processor, successor to the KNC, that has up to 72 cores, each of which can run 4 threads, and 16 GB of high bandwidth memory stacked directly on to the chip.

Debugging in 5D

Author: Adrian Jackson
Posted: 24 Feb 2016 | 16:41

Or why debugging is hard and parallel debugging doubly so

Computing bug: Grace Hopper's famous bug found in 1947 in a relay in the Mark II computer, taped it to the operations logbook with the annotation "First actual case of bug being found". Image courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. - U.S. Naval Historical Center Online Library Photograph

Debugging programs is hard. I give a lecture on debugging for the Programming Skills module of EPCC's MScs in HPC and HPC with Data Science where we try to point out common programming mistakes, programming strategies for making bugs less likely, and the skills and tools required for investigating, identifying, and fixing bugs.

Pages

Blog Archive