Adrian Jackson's blog
Posted: 23 Mar 2020 | 10:45
I was recently working with a colleague to investigate performance issues on a login node for one of our HPC systems. I should say upfront that looking at performance on a login node is generally not advisable, they are shared resources not optimised for performance.
We always tell our students not to run performance benchmarking on login nodes, because it's hard to ensure the results are reproducible. However, in this case we were just running a very small (serial) test program on the login node to ensure it worked before submitting it to the batch systems and my colleague noticed a performance variation across login nodes that was unusual.
Posted: 30 Oct 2019 | 12:48
Blog post updated 8th November 2019 to add Figure 6 highlighting PMDK vs fsdax performance for a range of node counts.
Following on from the recent blog post on our initial performance experiences when using byte-addressable persistent memory (B-APM) in the form of Intel's Optane DCPMM memory modules for data storage and access within compute nodes, we have been exploring performance and programming such memory beyond simple filesystem functionality.
For our previous performance results we used what is known as a fsdax (Filesystem Direct Access) filesystem, which enables bypassing the operating system (O/S) page cache and associated extra memory copies for I/O operations. We were using an ext4 filesystem on fsdax, although ext2 and xfs filesystems are also supported.
Posted: 9 Oct 2019 | 17:30
Sharing of resources has challenges for the performance and scaling of large parallel applications. In the NEXTGenIO project we have been focusing specifically on I/O and data management/storage costs, working from the realisation that current filesystems will struggle to efficiently load and store data from millions of processes or tasks all requesting different data sets or bits of information.
Posted: 17 Jul 2019 | 14:11
As part of the NEXTGenIO project we have a prototype HPC system that has two Intel Omni-Path networks attached to each node. The aim of having a dual-rail network setup for that system is to investigate the performance and functionality benefits of having separate networks for MPI communications and for I/O storage communications, either directing Lustre traffic and MPI traffic over separate networks, or using a separate network to access NVDIMMs over RDMA. We were also interested in the performance benefits for general applications exploiting multiple networks for MPI traffic, if and where possible.
Posted: 12 Dec 2017 | 11:16
November 2017 Top500
My initial impression of the latest Top500 list, released last month at the SC17 conference in Denver, was that little has changed. This might not be the conclusion that many will have reached, and indeed we will come on to consider some big changes (or perceived big changes) that have been widely discussed, but looking at the Top 10 entries there has been little movement since the previous list (released in June).
Posted: 16 Aug 2017 | 15:59
I recently attended the 2017 Flash Memory Summit, a conference primarily aimed at storage technology and originally based around flash memory, although it has expanded to cover all forms of non-volatile storage technology.
Non-volatile memory is a big deal nowadays. It is memory that stores data even when it has no power (unlike the volatile memory in computers that lose data when power is switched off). Flash memory is a particular form for non-volatile memory, it's been used for a long time, and has had a massive impact on consumer technology, from the storage in your cameras and phones, to SSD hard drives routinely installed in laptop and desktop systems.
Posted: 10 Aug 2017 | 20:22
Distributed ledgers, the core technology underlying digital currencies such as BitCoin, offer some interesting functionality for constructing distributed data infrastructures.
Ledgers can be considered to be simple data stores. They are styled on accounting ledgers, books where transactions are recorded one after the other, and the overall state of the accounts can be evaluated by working through the recorded transactions to calculate how much money has flowed in and out of the accounts.
Posted: 15 Jun 2017 | 13:41
We are entering the fourth year of the Intel Parallel Computing Centre (IPCC). This collaboration on code porting and optimisation has focussed on improving the performance of scientific applications on Intel hardware, specifically its Xeon and Xeon Phi processors.
Posted: 30 May 2017 | 11:01
Posted: 24 May 2017 | 19:30
When we parallelise and optimise computational simulation codes we always have choices to make. Choices about the type of parallel model to use (distributed memory, shared memory, PGAS, single sided, etc), whether the algorithm used needs to be changed, what parallel functionality to use (loop parallelisation, blocking or non-blocking communications, collective or point-to-point messages, etc).