Task placement on Intel Xeon Phi

Author: Fiona Reid
Posted: 12 Jul 2013 | 09:52

For the last few months I've been investigating the performance and scaling of the CP2K computational chemistry code on the Xeon Phi.

Our first hurdle was getting the code to compile, which proved a bit harder than I'd anticipated due to a mix of code bugs, library bugs and compiler bugs. Having fixed these I started to investigate the performance on the Xeon Phi card. The initial results were disappointing, the MPI/OpenMP version of the code went a lot slower (around 9 times) than I'd hoped based on the performance of the MPI and OpenMP-only versions.

After a lot of head scratching and testing, I discovered the reason for the poor performance was the way in which the Xeon Phi card was placing tasks. The default placement resulted in over-subscription of the virtual threads, meaning that my benchmarks were running on far fewer resources than they were supposed to.

The figure shows the performance of MPI, OpenMP and MPI+OpenMP versions of CP2K. The blue diamonds show the original performance with poor task placement. The green line shows the final result with optimal placement. This obtained better performance than both the MPI and OpenMP versions and enabled more virtual threads to be used. The best placement was found to be a balanced approach where each of the 60 physical cores have as few threads as possible whilst also keeping the threads belonging to a particular MPI process physically close to one another. An example of how to place tasks for the MPI/OpenMP version with 2 MPI processes each running 2 OpenMP threads is as follows:

export OMP_NUM_THREADS=2 mpirun -prepend-rank -genv LD_LIBRARY_PATH path_to_the_mic_libs \ -np 1 -env KMP_AFFINITY verbose,granularity=fine,proclist=[1,5],explicit -env OMP_NUM_THREADS ${OMP_NUM_THREADS} $CP2K_BIN/cp2k.psmp H2O-64.inp : \ -np 1 -env KMP_AFFINITY verbose,granularity=fine,proclist=[9,13],explicit -env OMP_NUM_THREADS ${OMP_NUM_THREADS} $CP2K_BIN/cp2k.psmp H2O-64.inp &> psmp_2_procs_2_threads.out


Fiona Reid, EPCC

Blog Archive