Optimizing CPU oversubscribed reductions

Author: Guest blogger
Posted: 6 Sep 2019 | 11:17

Gladys Utrera was an HPC Europa3 visitor to EPCC from 1st of July to the 3rd of August 2019. She has been an HPC Europa visitor before - in this brief blog article she tells us what she did this time.

Hi! My name is Gladys Utrera and I currently work at the Computer Architecture Department of the Universitat Politècnica de Catalunya in Barcelona, Spain. There I combine teaching of parallel computing and operating systems subjects with research on HPC topics and parallel programming models. Thanks to the HPC-Europa3 programme I could spend four weeks at the EPCC in the University of Edinburgh with my host Mark Bull who is a very well-known expert in my research areas. This is my third research visit to EPCC, and the second as an HPC-Europa visitor. 

My research involves executions on large multi-core node clusters. In addition, performance evaluations on machines with different architectures characteristics like ARCHER and Cirrus, enrich my results greatly. 

During my visit, I worked on the algorithm of the MPI_Allreduce collective operation. In particular, I wanted to optimize the operation when executed by processes oversubscribing CPUs, that is when more than one process is attached to a single CPU.

The motivation for this headache, comes from a previous project, where a strategy applied to MPI parallel unbalanced applications to concentrate idle CPU cycles with the objective of freeing CPUs. To that aim, processes within a node are migrated and CPUs are oversubscribed.

The optimizations implemented to the MPI_Allreduce collective exploit the fact that processes attached to the same CPU share all cache levels, so we added a new step in the algorithm at a CPU level, where a memory region is defined as shared. In this way, the reduction is first performed locally, then is combined with the rest of local results following the native algorithm of the current MPI library in use. The local calculation is pipelined by splitting the original message in as many pieces as the number of processes attached to a CPU.

The experimental results at the EPCC were very promising and we are planning to write scientific publication with this research.

Apart from working hard, I really to enjoyed the magic city of Edinburgh, its topography, architecture, parks and environment: I loved my 45 minutes walking to and 45 minutes walking from the EPCC each day!

I also visited the Highlands once more, and would return whenever I would have an opportunity.

Finally, I greatly recommend the HPC-Europa experience at the EPCC. I also would like to thank all the support received by Catherine Inglis, Mario Antonioletti and of course Mark Bull, which contributed as in the previous stays to be an unforgettable experience.