Posted: 19 Apr 2013 | 10:57
Programmability of GPUs (or accelerators in general) has improved since the days of the OpenGL shaders. First CUDA, and OpenCL later, have evolved to offer a reasonable way of programming efficient algorithms onto GPUs. However, despite this improvement, there is still a lot of effort involved in the development of code for accelerators. This is inevitable sometimes: if you have a particular algorithm and you want to have the maximum performance possible for a particular accelerator architecture, and you have the time to do it, you can immerse yourself in the marvellous world of CUDA/OpenCL low-level optimisation and stop reading. If time is critical for you, as it is for me, then you will love the latest advance in accelerator programmability: OpenACC.
During the SC11 in Seattle, a group of vendors and some universities presented a directive-based programming model for GPUs called OpenACC. In a similar fashion to OpenMP, adding directives to portions of code enables compilers to improve their understanding of the code. Then, compilers can automatically generate code for the target accelerator.
There are three main vendors offering support for OpenACC at the moment: PGI, CAPS and Cray. PGI and CAPS offer trial versions of their products, and if you are a happy customer from Cray, OpenACC support has been added to their compiler toolchain. All of them are, however, closed-source and protected with various licences.
If you're feeling adventurous, the people at University of La Laguna have very recently released the 0.2 version of their open source OpenACC compiler - accULL. accULL is a research-oriented implementation of the standard, and it is based on two software packages: a python-based source-to-source compiler called yacf and a C++ runtime named Frangollo. Both of them were developed before OpenACC for a different language, but given their flexibility it was quite easy to produce an OpenACC implementation in a short period of time. I can tell you from first-hand, since that was the outcome of my Ph.D thesis!
Details about the current release can be found here. It has support for CUDA and OpenCL platforms, and results on various platforms have been published in several journals (MICPRO-2011, JoS-2011,JoS-2013) - including running OpenACC without a GPU.
Bear in mind that it is ongoing research, and although the majority of the standard is implemented, some combinations of directives may produce wrong results. Feel free to fix the problem if you found it, but don't forget to report the patch to the developers mailing list.
Online CUDA training at EPCC: Learn CUDA In An Afternoon.