NEXTGenIO at ISC High Performance 2019

6 June 2019

The highly successful NEXTGenIO(link is external) project is now drawing to a close after nearly four years. EPCC colleagues will be at ISC19 presenting the results of the project at a booth presentation, a BoF, and a workshop presentation. Come along and find out more!

Booth Presentation: Intel Datacenter Persistent Memory Modules for Efficient HPC Workflows

Tuesday 18th June, 1:40pm - 2:00pm, Intel booth (F-930)

For more information about this presentation by EPCC's Adrian Jackson, see abstract below.

BoF: Multi-Level Memory and Storage for HPC and Data Analytics & AI

Tuesday 18th June, 1:45pm – 2:45pm, Kontrast

Join EPCC's Michèle Weiland, along with Hans-Christian Hoppe of the Intel Datacenter Group and Kathryn Mohror of Lawrence Livermore National Laboratory, for a BoF discussing use cases and requirements for next-generation multi-level storage/memory systems, present proof of concept prototype results, and system software and tools development.

For more information, see the BoF webpage.

Workshop presentation: HPC-IODC - HPC I/O in the Data Center

Thursday 20th June, 9:00am – 6:00pm, Basalt

EPCC's Adrian Jackson will present results from NEXTGenIO in his talk, “An Architecture for High Performance Computing and Data Systems using Byte-Addressable Persistent Memory”, a research paper co-authored with EPCC colleagues Michèle Weiland and Mark Parsons, and Bernhard Homölle of Fujitsu.

For more information, see the workshop webpage.

Abstract for Booth Presentation: Intel Datacenter Persistent Memory Modules for Efficient HPC Workflows

The NEXTGenIO project, which started in 2015 and is co-funded under the European Horizon 2020 R&D funding scheme, was one of the very first projects to investigate the use of DC PMM for the HPC segment in detail. Fujitsu have built up a 32-node prototype Cluster at EPCC using Intel Xeon Scalable CPUs (Cascade Lake generation), DC PMM (3 TBytes per dual-socket node), and Intel Omni-Path Architecture (a dual-rail fabric across the 32 nodes). A selection of eight pilot applications ranging from an industrial OpenFOAM use case to the Halvade genomic processingn workflow was studied in detail, and suitable middleware components for the effective use of DC PMM by these applications were created. Actual benchmarking with DC PMM is now possible, and this talk will discuss the architecture, the use of memory and app-direct DC PMM modes, and give first results on achieved performance.

As poster children for the use of DC PMM as extremely fast local storage targets, the OpenFOAM and Halvade workflows show a very significant reduction in I/O times required by passing data between workflow steps, and consequently, significantly reduced runtimes and increased strong scaling. Taking this further, a prototype setup of ECMWF's IFS forecasting system, which combines the actual weather forecast with several dozens of post-processing steps, does show the vast potential of DC PMM: forecast data is stored in DC PMM on the nodes running the forecast, while post-processing steps can quickly access this data via the OPA network fabric, and a meteorological archive pulls the data into long-term storage. Compared to the traditional system configurations, this scheme brings significant savings in time to completion for the full workflow.

Both of the above do use app-direct mode; the impact and value of memory mode is shown by a key materials science application (CASTEP), the memory requirements of which far exceed the usual HPC system configuration of approx. 4 GByte/core. In current EPCC practice, CASTEP uses only a fraction of the cores on each Cluster node - DC PMM in memory mode, with its up to 3 TBytes capacity on the NEXTGenIO prototype, enables use of all cores, and even with the unavoidable slowdown of execution compared to a DRAM-only configuration, the cost of running a CASTEP simulation is reduced, and the scientific throughput of a given number of nodes is increased commensurately.