Posted: 9 Dec 2019 | 12:48
MVAPICH is a high performance implementation of MPI. It is specialised for InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE communication technologies, but people generally use the default module loaded on their system. This is important because, as HPC programmers, we often optimise our codes but overlook the potential performance gains of better choice of MPI implementation.
Posted: 17 Jul 2019 | 14:11
As part of the NEXTGenIO project we have a prototype HPC system that has two Intel Omni-Path networks attached to each node. The aim of having a dual-rail network setup for that system is to investigate the performance and functionality benefits of having separate networks for MPI communications and for I/O storage communications, either directing Lustre traffic and MPI traffic over separate networks, or using a separate network to access NVDIMMs over RDMA. We were also interested in the performance benefits for general applications exploiting multiple networks for MPI traffic, if and where possible.
Posted: 27 Feb 2019 | 15:53
The MPI Standard states that nonblocking communication operations can be used to “improve performance… by overlapping communication with computation”. This is an important performance optimisation in many parallel programs, especially when scaling up to large systems with lots of inter-process communication.
However, nonblocking operations can also help with making a code correct – without introducing additional dependencies that can degrade performance.
Posted: 21 Apr 2018 | 16:21
In the March 2018 meeting of the MPI Forum, the “Persistent Collectives” proposal began the formal ratification procedure, the “Sessions” proposal took a step forward, but the “Fault Tolerance” saga took a step side-ways.
The proposal to add persistent collective operations to MPI was formally read at the March meeting, and was well-received by all those present. The first vote for this proposal will happen in June and the second vote in September. If all goes well, this addition to MPI will be announced at SC18.
Posted: 25 Jan 2018 | 14:36
Many HPC applications contain some sort of iterative algorithm and so do the same steps repeatedly, over and over again, with the data gradually converging to a stable solution. There are examples of this archetype in structural engineering, fluid flow, and all manner of other physical simulation codes.
Posted: 8 Nov 2017 | 10:23
This year’s MPI Birds-of-a-Feather meeting at SC17 will be held on Wednesday 15th November. I’ll be talking about the Sessions proposal – and explaining why it’s no longer called Sessions!
Spoiler: the working group has been looking at how Teams might interact with Endpoints.
Posted: 11 Apr 2017 | 17:59
Shall I compare thee...
Performance comparisons are always tricky to get exactly right. They are needed to ensure that we can demonstrate the performance improvements that optimisations, new hardware, new algorithms, etc... have had on an application or benchmark, but there is a lot of latitude in what can be compared, which makes it easy to get a performance comparison wrong and not properly demonstrate whatever it is you're trying to show.
Posted: 30 Aug 2016 | 12:22
Knights Landing MPI performance
Following on from our recent post on early experiences with KNL performance, we have been looking at MPI performance on Intel's latest many-core processor.
The MPI performance on the first generation of Xeon Phi processor (KNC) was one of the reasons that some of the applications we ported to KNC had poor performance. Figures 1 and 2 show the latency and bandwidth of an MPI ping-pong benchmark running on a single KNC and on a 2x8-core IvyBridge node.
Posted: 29 Jul 2016 | 16:45
Initial experiences on early KNL
Updated 1st August 2016 to add a sentence describing the MPI configurations of the benchmarks run.
Updated 30th August 2016 to add CASTEP performance numbers on Broadwell with some discussion
KNL is a many-core processor, successor to the KNC, that has up to 72 cores, each of which can run 4 threads, and 16 GB of high bandwidth memory stacked directly on to the chip.
Posted: 24 Feb 2016 | 16:41
Or why debugging is hard and parallel debugging doubly so
Debugging programs is hard. I give a lecture on debugging for the Programming Skills module of EPCC's MScs in HPC and HPC with Data Science where we try to point out common programming mistakes, programming strategies for making bugs less likely, and the skills and tools required for investigating, identifying, and fixing bugs.