Second day of collaborating with Colfax
Posted: 10 Jun 2015 | 00:08
Day 2: profiling and the start of optimising
After a first day spent getting codes set up and systems running, we got into the profiling of CP2K in anger today and have made some good progress.
In general we have been focussing on the pure MPI version of CP2K when doing our optimisation work in the IPCC project. However, in this collaboration with Colfax we have been looking in more depth at the OpenMP version. This is not the version of the code that users will typically exploit for simulations as the MPI code generally has better performance, with the OpenMP functionality primarily being used to provide access to more memory when performing large-scale simulations (see this web page for more details).
However, the Xeon Phi has around 60 physical cores, each of which can run 4 threads efficiently, but struggles with MPI performance when running large numbers of MPI tasks on a single Xeon Phi. Therefore, the OpenMP code is a sensible target for optimisation work to exploit the Xeon Phi.
The profiling that we've been doing with Intel's Vtune profiler and Allinea's MAP profiler has identified areas of the code that were performing a lot of allocations and deallocations of memory, so our first attempt at optimisation is going to be replacing these memory operations with static arrays that are large enough to cope with any situation our benchmark requires.
Whilst this is not a clean or functional coding solution, it looks like we could save around 20% on the performance of the OpenMP code if this optimisation can be carried out (although it should be noted that even with this optimisation the OpenMP code will still be slower than the MPI code).
We have also identified some operations in the inner loops of the code that it may be possible to remove, or replace with alternatives, which may give some performance improvements with the serial code.
We will be investigating both these optimisations tomorrow and will publish a new blog post with our findings after the work is done!