Adrian Jackson's blog

Precision persistent programming

Author: Adrian Jackson
Posted: 30 Oct 2019 | 12:48

Targeted Performance

Optane DIMM

Blog post updated 8th November 2019 to add Figure 6 highlighting PMDK vs fsdax performance for a range of node counts.

Following on from the recent blog post on our initial performance experiences when using byte-addressable persistent memory (B-APM) in the form of Intel's Optane DCPMM memory modules for data storage and access within compute nodes, we have been exploring performance and programming such memory beyond simple filesystem functionality.

For our previous performance results we used what is known as a fsdax (Filesystem Direct Access) filesystem, which enables bypassing the operating system (O/S) page cache and associated extra memory copies for I/O operations. We were using an ext4 filesystem on fsdax, although ext2 and xfs filesystems are also supported.

Global or local - which is best?

Author: Adrian Jackson
Posted: 9 Oct 2019 | 17:30

Selfish performance

Sharing of resources has challenges for the performance and scaling of large parallel applications. In the NEXTGenIO project we have been focusing specifically on I/O and data management/storage costs, working from the realisation that current filesystems will struggle to efficiently load and store data from millions of processes or tasks all requesting different data sets or bits of information.

Multi-network MPI on Intel Omni-Path

Author: Adrian Jackson
Posted: 17 Jul 2019 | 14:11

Networks

As part of the NEXTGenIO project we have a prototype HPC system that has two Intel Omni-Path networks attached to each node. The aim of having a dual-rail network setup for that system is to investigate the performance and functionality benefits of having separate networks for MPI communications and for I/O storage communications, either directing Lustre traffic and MPI traffic over separate networks, or using a separate network to access NVDIMMs over RDMA. We were also interested in the performance benefits for general applications exploiting multiple networks for MPI traffic, if and where possible.

Top500: Change or no change?

Author: Adrian Jackson
Posted: 12 Dec 2017 | 11:16

November 2017 Top500

My initial impression of the latest Top500 list, released last month at the SC17 conference in Denver, was that little has changed. This might not be the conclusion that many will have reached, and indeed we will come on to consider some big changes (or perceived big changes) that have been widely discussed, but looking at the Top 10 entries there has been little movement since the previous list (released in June).

View from the storage side

Author: Adrian Jackson
Posted: 16 Aug 2017 | 15:59

Flash Memory Summit Logo

I recently attended the 2017 Flash Memory Summit, a conference primarily aimed at storage technology and originally based around flash memory, although it has expanded to cover all forms of non-volatile storage technology.

Non-volatile memory is a big deal nowadays.  It is memory that stores data even when it has no power (unlike the volatile memory in computers that lose data when power is switched off).  Flash memory is a particular form for non-volatile memory, it's been used for a long time, and has had a massive impact on consumer technology, from the storage in your cameras and phones, to SSD hard drives routinely installed in laptop and desktop systems.

Distributed ledgers for carbon markets

Author: Adrian Jackson
Posted: 10 Aug 2017 | 20:22

Distributed ledgers, the core technology underlying digital currencies such as BitCoin, offer some interesting functionality for constructing distributed data infrastructures.Ledger

Ledgers can be considered to be simple data stores. They are styled on accounting ledgers, books where transactions are recorded one after the other, and the overall state of the accounts can be evaluated by working through the recorded transactions to calculate how much money has flowed in and out of the accounts.

The Intel Parallel Computing Centre at EPCC

Author: Adrian Jackson
Posted: 15 Jun 2017 | 13:41

We are entering the fourth year of the Intel Parallel Computing Centre (IPCC). This collaboration on code porting and optimisation has focussed on improving the performance of scientific applications on Intel hardware, specifically its Xeon and Xeon Phi processors.  

X windows and XQuartz on ARCHER

Author: Adrian Jackson
Posted: 30 May 2017 | 11:01

Paraview and ARCHERX Server

Every so often we get an ARCHER query where Paraview isn't working for somebody. As Paraview requires remote window functionality (X Servers) and can also do offscreen rendering and all sorts of other things, it can be complicated to get it working properly and efficiently. 

Balancing act: optimise for scaling or efficiency?

Author: Adrian Jackson
Posted: 24 May 2017 | 19:30

When we parallelise and optimise computational simulation codes we always have choices to make. Choices about the type of parallel model to use (distributed memory, shared memory, PGAS, single sided, etc), whether the algorithm used needs to be changed, what parallel functionality to use (loop parallelisation, blocking or non-blocking communications, collective or point-to-point messages, etc).

ARCHER code developers and presenting performance

Author: Adrian Jackson
Posted: 11 May 2017 | 00:06

Application performance

As part of the ARCHER Knights Landing (KNL) processor testbed, we have produced and collected a set of benchmark reports on the performance of various scientific applications on the system. This has involved the ARCHER CSE team, EPCC's Intel Parallel Computing Center (IPCC) team, and various users of the system all benchmarking and documenting the performance they have experienced. 

Pages