Studying for a PhD at EPCC: NVIDIA internship

15 November 2023

Ricardo Jesus, a PhD student at EPCC, writes about the six-month internship he undertook with NVIDIA this year.

Young man sitting on hillside, facing camera.

I work mostly on software optimisations for Arm-based supercomputers. My main research interests are centred on performance analysis and optimisation, high-performance parallel programming languages and runtimes, novel HPC programming models, and computational maths. Some of the work I've done so far includes the study of the performance of atomic operation on AArch64 CPUs, which uncovered some issues with the scaling of these instructions on a few processors, and the development of optimised routines for number-theoretic transforms, which enabled us to carry out a world-record computation of the number of Goldbach partitions of the even integers.

Industry experience

I had been looking for a while for an internship that would let me have some industry experience before wrapping up my PhD and also enable me to work on top-of-the-line AArch64 hardware. This internship was perfect since besides combining these two things, it also allowed me to delve into compilers, an area I was also very keen to explore in my PhD.

Optimisations for AArch64 hardware 

The broad topic of the internship was to drive optimisations on LLVM for AArch64 hardware (in particular NVIDIA's Grace CPU). I authored a few commits on LLVM that improve codegen for AArch64 targets, but the main focus of my work was on developing the scheduling model for the Neoverse V2 core (which Grace is based on) and enabling LLVM-MCA, a static performance analysis tool part of LLVM, to make good/accurate predictions for it. This resulted in the Neoverse V2 core scheduling model that LLVM now ships, which received some attention on social media and led to a talk, “LLVM-MCA correlation for AArch64”, on the status of LLVM-MCA for AArch64 hardware which has been accepted for the US LLVM Developers' Meeting.

The internship was immensely useful! I learned a lot, in particular about the LLVM AArch64 backend and scheduling models. I also made great connections, and overall had a fantastic and very enriching experience. Praise is due to NVIDIA for making a real effort to ensure its internships were successful. I mostly worked remotely from EPCC, but I did also visit the NVIDIA office and meet the team in Cambridge. We interns were supported the whole time, each of us had a dedicated mentor and constant points of contact should we need any help (whether related to our internship or our technical work).

Next steps

During this final stretch of my PhD, I plan to revisit the work I did on LLVM-MCA to see if I can leverage it to develop compiler-guided optimised routines for CFD kernels and other such codes. The idea is to have a way of assessing the impact of changing compiler flags in performance without actually having to run the code.

Papers

A Study on the Performance Implications of AArch64 Atomics

Vectorizing and distributing number-theoretic transform to count Goldbach partitions on Arm-based supercomputers

Author

Ricardo Jesus, EPCC PhD student

Ricardo is supervised by Michèle Weiland and Adrian Jackson at EPCC.