EPCC activities at SC23: Novel architectures and how to program them
9 November 2023
A variety of new hardware architectures are becoming available for HPC and Machine Learning workloads and they, along with solving challenges around how to program them effectively, form a major part of my research interests.
Ultimately, much of this novel hardware can offer impressive performance and energy efficiency when compared to common HPC technologies, but it is crucial that we develop a deep understanding of how to leverage such hardware and enable users to easily and effectively exploit it. These topics underpin many of the activities that I am involved in at SC23 this month.
RISC-V for HPC workshop
The Second International Workshop on RISC-V for HPC workshop, which I have led the organisation of, will run on the afternoon of Monday 13 November between 2pm and 5:30pm.
In collaboration with the RISC-V International HPC Special Interest Group (SIG), this event follows a very successful first run of the workshop earlier in the year at ISC. The purpose of the session is to encourage popularisation of RISC-V for HPC workloads and, because the field is moving so rapidly, many technological updates and advances have been reported since ISC.
The workshop will begin with an invited talk by the CTO of RISC-V International, Mark Himelstein, who will describe how and why RISC-V is growing and its benefits to the HPC community. This will be followed by four lightening vendor talks, where major RISC-V HPC vendors will give five minute talks about their technology and its effectiveness for HPC workloads, and six research paper presentations. The workshop schedule can be found here .
Test driving the 64-core RISC-V SG2042
One of the papers in the RISC-V workshop is from us in EPCC, as part of the ExCALIBUR H&ES RISC-V testbed, and reports on benchmarking the SG2042.
The SG2042 is the world’s first high performance, high-core count and commodity available RISC-V CPU. Developed by SOPHGO, it has 64 cores and we have a couple of nodes of this technology in our testbed for users to experiment with. Titled Is RISC-V Ready for HPC Prime-Time: Evaluating the 64-Core Sophon SG2042 RISC-V CPU, our paper reports benchmaking results when measuring using RAJAPerf, a common HPC benchmarking suite, and both comparing against other RISC-V CPUs and a range of more traditional x86 CPUs. More details about this can be found in the preprint of the paper here.
The image above is an Milk-V pioneer RISC-V workstation we have in the testbed. This contains a 64-core SG2042 CPU and 128GB of RAM, providing a serious proposition for HPC workloads.
HPC Next: The RISC-V Ecosystem BoF
On the Tuesday evening, from 5:15pm–6:45pm, I will be speaking at a Birds-of-a-Feather (BoF) session on RISC-V for HPC entitled HPC Next: The RISC-V Ecosystem, which has been organised by the RISC-V HPC SIG. As part of this I will be describing the RISC-V testbed that we have in EPCC. To follow there will be a panel discussion with the audience to explore challenges and opportunities around adopting RISC-V in HPC. More details on the session, including the other speakers, can be found here.
Advanced Architecture Testbeds BoF
I am also talking at the Advanced Architecture "Playgrounds" - Past Lessons and Future Accesses of Testbeds BoF, which runs on the Thursday between 12:15pm and 1:15pm. In this session I will describe some of the ExCALIBUR H&ES architecture testbeds and how scientific software developers are leveraging them to experiment with new architectures for their codes, as well as some of the lessons learned. More details can be found here.
Auto optimisation of stencil codes for FPGAs
One of my PhD students, Gabriel Rodriguez-Canal, will present his paper entitled Stencil-HMLS: A Multi-Layered Approach to the Automatic Optimization of Stencil Codes on FPGAs at the H2RC workshop on the Friday morning. This paper reports research around leveraging MLIR to enable auto-optimisation of code written in the PSyclone Fortran DSL when targeting FPGAs.
By progressively lowering between different MLIR dialects, we were able to show significant performance gains across the board, and a maximum of 100 times increase, compared to the existing state of the art. Leveraging xDSL, which has been developed as part of the xDSL ExCALIBUR project, the research findings demonstrate that it is possible to encode FPGA-specific optimisations within MLIR dialects and transformations, and for these to operate automatically on Von Neumann codes. For more information you can see the preprint of the paper here.
Fortran performance optimisation and auto-parallelisation
Continuing the compiler theme, I am presenting the paper Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang in the LLVM workshop on the Sunday morning.
This describes work we have done in the xDSL ExCALIBUR project using MLIR to extract stencil patterns from serial Fortran code and to then optimise and parallelise them automatically. This results in efficient execution on the CPU (both multi-threaded and distributed memory parallelism), as well as good performance on the GPU, all with unmodified serial Fortran code.
For applicable benchmarks we demonstrate, on average, around a three to five times' improvement in performance when compared to the standard Flang flow, and most impressively users are able to automatically leverage GPUs and distributed memory parallelism without any code modifications required. More details can be found in the preprint of the paper here.
ExCALIBUR UK research programme
SC promises to be a busy week! In this article I have mentioned the ExCALIBUR programme a few times. ExCALIBUR is a UK research programme that aims to deliver the next generation of high-performance simulation software for the highest-priority fields, enabling them to seize the opportunities of computing at the Exascale. It currently funds many of my research activities.
I hope to see you at SC23!