Extracting the SpMV kernel from PDE solver implementations

Project Description

The solving of Partial Differential Equation (PDEs) underlies many workloads that are running on the world's supercomputers, this research will explore how to optimise this computation, potentially delivering significant improvements in accuracy at reduced runtime.

Primary Supervisor: Dr Paul Bartholomew

Project Overview

The numerical solution of PDEs has enabled greater understanding of many real-world problems and plays a vital role in modern design processes – for example CFD and computational stress analysis are both employed in various industries from power generation to transport and as part of fundamental scientific research. To do so, the PDE governing the problem of interest must be discretised so that it can be solved numerically. The solution methods employed involve evaluating the discretised PDE which can be expressed as a Sparse Matrix-Vector product (SpMV). Due to the complexity of the problem being considered, and of developing high performance software, the implementation of the discretisation and components such as the SpMV often end up entwined, hampering progress.

The goal of this project is to separate the underlying SpMV kernel from the numerical description of the physics. This will allow a true “separation of concerns” in a core component of PDE solvers. Doing so will enable direct comparisons of the performance of different discretisation methods when running on real hardware using the same underlying software. By separating the application/evaluation of the discretisation from its definition, the effort to port between different hardware (including CPUs, GPUs and novel hardware such as the Cerebras Wafer-Scale Engine (WSE)) and different parallel programming models will be reduced, allowing a range of HPC technologies to be explored.

Overview of the research area

One of the core tasks in developing a PDE solver is to translate the PDE from the continuous domain, where it is solved by analytical methods, to the discrete domain, where its solution may be approximated by numerical methods. Following a numerical approach enables the solution of these problems on computers. Put simply, this means developing algebraic approximations of the continuous operators. By applying these approximations to each term in the PDE a discrete approximation can be written at every point in the solution domain, this results in a large (perhaps nonlinear) system of equations to solve.

These equation systems are generally sparse, with many zero entries, although their particular properties depend on the chosen discretisation scheme. Some examples include: finite difference methods which yield sparse, regular matrices; finite element and finite volume methods giving sparse, irregular matrices; and spectral methods whose discrete operators produce dense matrices. Each method has different advantages and disadvantages making selecting an optimal choice for all problems challenging. Furthermore, once a method is chosen the constraints it imposes and their impact on the implementation may render exploring alternative choices impractical due to the effort required to change this selection.

Once discretised, the solution is advanced using either explicit or implicit methods. In either case evaluating the matrix-vector product of the discretised system is generally a core kernel of the solver, making optimising this operation vital to achieving high performance on HPC machines. Due to its ubiquity throughout scientific computing, there has been decades of research into optimising the SpMV to improve data layouts, exploit parallelism and increase vectorisation. Consequently, there are many choices for SpMV and linear algebra packages such as MORPHEUS or PETSc abstract these from the user so that they can benefit from the optimised implementations in that package.

As noted above, evaluating these discretised equations can be represented as an SpMV and optimisations based on SpMV research have been exploited. For example, NASA’s FUN3D solver was optimised for large, distributed memory supercomputers with cache-based node architectures and in this work they considered the ordering of the computation graph to maximise cache reuse, applying a bandwidth reducing ordering of nodes followed by sorting the edges based on their originating node and similar optimisations are applied to this day. This ordering mirrors an optimisation in COOrdinate storage (COO) SpMV implementations, however this relationship is not made explicitly in the paper. Indeed, the complexity of these programs may obscure the relationship between the PDE solver and the underlying SpMV the developers are implementing, even though they are likely aware that such a relationship exists. Due to this disconnect there is also potential for developments in SpMV methods to be overlooked by PDE solvers, or that PDE optimisations follow a parallel path, resulting in duplication of effort.

Potential research questions

Initial stages of the work will involve proving the basic concept:

Can we divorce the underlying execution of the SpMV from the discretisation of physics?
Does this facilitate implementing different discretisations within a single solver framework, and are the optimal choices determined by the class of method (FDM, FVM, FEM), Arithmetic Intensity (high- vs low-order), something else, or insensitive/uniform?
Does special casing the SpMV to exploit runtime knowledge, e.g. using the DIA format for structured mesh problems, prove beneficial over more general implementations (COO, CSR). Can we apply previous work to select optimal SpMV methods based on matrix properties [Stylianou & Weiland (2023)] in these cases? Are newer formats such as SELL-C-s optimal for PDE evaluation?

By separating the discretised physics from its parallel implementation we can explore different parallel programming methods and simplify porting to different architectures which might otherwise require a significant effort

Explore stdpar vs OpenMP (vs CUDA/HIP) vs ?
Explore MPI vs SHMEM vs ?
Alternative backends (OP2/OPS)?

This will allow investigating the implementation of PDE solvers on a range of current (CPU and GPU) and novel (WSE, FPGA) hardware and assessing how to optimise for upcoming HPC architectures.

Student Requirements

"Minimum requirement of a UK 2:1 honours degree, or its international equivalent, in a relevant subject such as computer science and informatics, physics, mathematics, or engineering.

English Language requirements as set by University of Edinburgh

Recommended/Desirable Skills

Experience in the application and/or development of numerical methods, e.g. computational fluid dynamics
A good understanding of mathematical modelling in the context of scientific computing
Experience in compiling and running codes on high-performance computing systems

How to apply

Applications should be made via the University application form, available via the degree finder. Please note the proposed supervisor and project title from this page and include this in your application. You may also find this page is an useful starting point for a research proposal and we would strongly recommend discussing this further with the potential supervisor.

Further Information

C.Stylianou & M.Weiland; “Exploiting dynamic sparse matrices for performance portable linear algebra operations”; IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (2022)
S.Balay et al.; “PETSc web page”; https://petsc.org, accessed 31-Aug-2025
W.K.Anderson et al.; “Achieving high sustained performance in an unstructured mesh CFD application”; ACM/IEEE SC 1999 Conference (1999)
C.Stylianou & M.Weiland; “Optimizing sparse linear algebra through automatic format selection and machine learning”; IEEE International Parallel and Distributed Processing Symposium Workshops (2023)