ExCALIBUR: preparing for Exascale
15 May 2023
ExCALIBUR is a UK research programme based around five themes that were identified as crucial for the UK’s progress towards Exascale computing: RSE Knowledge Integration; High Priority Use Cases; Emerging Requirements for High Performance Algorithms; Cross-cutting Research, and Hardware & Enabling Software (H&ES). EPCC is involved in a number of projects to address these themes.
The UNIVERSE-HPC project will create a comprehensive collection of training materials to train the next generation of research software engineers (RSEs), with the aim of increasing both the skill and diversity of people working in research software engineering.
The project is a collaboration between the Universities of Edinburgh, Oxford, Southampton, and University College London.
The first phase investigated existing training and looked to understand what prerequisite learning is required for each course, this has allowed us to create a map of existing training and develop multiple “learning pathways” to guide potential learners through the material. Secondly, we ran a quick survey to understand the factors that inhibit RSE learning on the job, and the most important was found to be lack of time. To address this, we have developed a new training series called “Byte Sized RSE” which delivers focused training sessions in concise, one-hour long sessions. Learning is then backed up with an accompanying podcast. The series has proved extremely popular; five sessions have been run so far on topics including licensing and code review.
SiMLInt is a cross-cutting project bringing the speed of machine learning (ML) to large-scale physics simulation, with a particular focus on plasma modelling in tokamak reactors, which is one of the high priority use cases.
The project aims to provide the infrastructure to enable efficient communication between simulation codes and data-driven models. This has the potential to allow scientists to run the simulations in a coarser resolution, using considerably fewer computational resources, and employing pre-trained ML models to supplement the unresolved, sub-grid scale information.
The technical implementation is based on Cray Labs’s SmartSim technology and ensures that both the physics and data-driven codes run in a synchronised manner and utilise the HPC resources well. The usefulness of the ML models is determined by the quality of the data they are trained on, and the way the training has been done, both of which require a sound understanding of the underlying ML techniques. Therefore, in the cross-cutting spirit of the project, SiMLInt also provides guidance and support for generating suitable training data, developing and training the ML model, and assessing its properties, while highlighting questions and considerations related to the reliability of the ML model and the whole workflow.
xDSL is working to develop a common ecosystem for Domain Specific Languages (DSLs). DSLs significantly raise the abstraction level when programming supercomputers, and this is especially important as we move towards much more complex architectures in the Exascale era. However, they currently share very little underlying infrastructure, resulting in a high barrier to entry for DSLs and long term support challenges. Exposing a Python interface to the ubiquitous MLIR and LLVM, xDSL provides the necessary building blocks for the low-overhead development of HPC DSLs.
ExCALIBUR H&ES RISC-V testbed
In EPCC we host the ExCALIBUR H&ES RISC-V testbed, enabling ExCALIBUR projects and UK HPC developers more widely to experiment with their codes on RISC-V. RISC-V, an open source Instruction Set Architecture (ISA) developed ten years ago and managed by RISC-V International, has the potential to transform computing. With over 10 billion RISC-V CPU cores already produced, the technology is enjoying phenomenal growth. In addition to the hardware itself, this project is also undertaking software development to improve the ecosystem for HPC and benchmarking.
ExCALIBUR H&ES FPGA testbed
EPCC also hosts the ExCALIBUR H&ES FPGA testbed, which aims to provide access to state-of-the-art Field Programmable Gate Arrays for scientists to port their codes to. FPGAs are configurable chips, enabling the electronics of the circuit to directly represent an application. This bespoke tailoring of the electronics to an algorithm can deliver significant performance and energy efficiency gains compared to fixed architectures, such as CPUs or GPUs. However a major challenge is how to best exploit this technology, which the testbed aims to assist with.
The ExCALIBUR programme (Exascale Computing ALgorithms & Infrastructures Benefiting UK Research) is supported by the UKRI Strategic Priorities Fund. The programme is led by the Met Office and the Engineering and Physical Sciences Research Council (EPSRC) along with the Public Sector Research Establishment, the UK Atomic Energy Authority (UKAEA) and UK Research and Innovation (UKRI) research councils.
Cray Labs' SmartSim technology: https://www.craylabs.org/docs/overview.html
Nick Brown, EPCC
Kirsty Pringle, EPCC
Anna Roubíčková, EPCC
Plasma modelling in tokamak reactors is one of the high priority use cases to be addressed by ExCALIBUR. Image of ITER tokamak reactor by Filipp Borshch/Getty Images.