GPU Programming Workshop
Graphics Processing Units (GPUs) were originally designed to display computer graphics, but they have developed into extremely powerful chips capable of handling demanding, general-purpose calculations. The GPU architecture is inherently is more suited to many types of intensive parallel computations than the traditional CPU, and hence computationally demanding sections of code can be accelerated to significantly increase overall performance. This is true not just for small-scale applications run on desktop size machines, but also for the largest-scale applications on massively parallel architectures. For example, the newly announced Cray XK6 supercomputer allows thousands of NVIDIA GPUs to be exploited in parallel to tackle grand challenge problems.
Applications must be adapted to utilise GPUs: most lines of application source code are executed on the CPU and key computational kernels are distributed to the GPU cores. Currently, for NVIDIA GPUs, the most popular programming method is the CUDA API, which is extremely powerful but requires significant development effort. OpenCL is an alternative API, which is less mature than CUDA but has portability advantages. Recently, a new higher-level standard has emerged, OpenACC, which promises to offer higher productivity. The programmer uses “directives” in the code to provide the compiler with the information required to automatically offload code to the GPU.
In this 3-day course we will introduce and provide hands-on experience of CUDA, OpenCL (with more emphasis on the former) and OpenACC. In many cases it is relatively straightforward to port a code to the GPU, but much harder to obtain good performance: we will cover a range of common GPU optimisation techniques.
No prior HPC or parallel programming knowledge is assumed, but attendees must already be able to program in C, C++ or Fortran. Access will be given to appropriate hardware for all the exercises.
This course is free to all academics.
Pre-requisite Programming Languages
Fortran, C or C++. It is not possible to complete the exercises in Java.
Practical Templates and Documentation
CUDA and OpenCL
09:30 Lecture: Introduction and GPU Architecture
10:15 Lecture: Programming with CUDA
11:30 Practical: Getting started with CUDA
13:30 Lecture: GPU Optimisation
14:00 Practical: Optimising a CUDA application
15:30 Case study: Scaling an Application to a Thousand GPUs and Beyond
09:00 Lecture: Programming with OpenCL
09:45 Practical: OpenCL programming *or* continue CUDA practical
11:30 Practical (cont.)
13:00 OpenACC Welcome and overview
13:15 OpenACC Session 1: An Introduction to OpenACC
13:15 Lecture: The OpenACC programming model
14:15 Practical: compiling and running a sample OpenACC code
15:15 OpenACC Session 2: Accelerating a simple code
15:15 Worked example: OpenACC-ing a simple code
15:45 Practical: accelerating the simple code
09:00 OpenACC Session 3: Accelerating a larger code
09:00 Lecture: Preparing to OpenACC a code
09:45 Worked example: OpenACC-ing a larger code
10:15 Practical: preparing and accelerating a larger application
11:15 Nvidia Roadmap Update (Timothy Lanfear)
13:30 OpenACC Session 4: Improving OpenACC performance
13:30 Lecture: OpenACC performance tuning and interoperability
14:15 Practical: continuing to accelerate a larger code
15:30 OpenACC Session 6: OpenACC for parallel applications
15:30 Case study: the parallel Multigrid and Himeno codes
16:15 Summary and outlook