TeamEPCC & the ISC'14 Student Cluster Competition

Author: Xu Guo
Posted: 1 Jul 2014 | 14:35

This post was written by TeamEPCC, four MSc in HPC students who achieved the highest-ever LINPACK score in the Student Cluster Competition at ISC14. 

In the photo (left to right): Chenhui Quan, Konstantinos Mouzakitis, Emmanouil Farsarakis (Manos) and Georgios Iniatis. 

The Student Cluster Competition (SCC) takes place at supercomputing conferences around the world. Students from some of the world’s most renowned universities form teams to design, assemble and configure a cluster to achieve top performance on a series of benchmark tests. The main rules are that you must not exceed a power cap of 3kW and you are not allowed to change your system’s hardware configuration once any benchmark result has been submitted.

At the ISC'14 competition, the HPCC and Linpack benchmarks are assessed every year. Students are informed of three further applications that they will be testing a few months before the competition begins. In this way, they are able to modify their design, investigate the usage of the applications and optimise them to the best of their ability. Two further “surprise” challenges are provided during the competition.

Awards are granted in three categories: “Highest Linpack”, best overall performance on the applications and HPCC and “Fan favourite”.

Our cluster and sponsors

TeamEPCC’s cluster was definitely the star cluster of this year’s competition. Thanks to our amazing sponsors, Boston Limited and CoolIT Systems, we were able to use first-class hardware, in some cases only recently available to the public. In addition to their hardware support, they were also incredibly generous in allowing us to make any alterations we wanted to the hardware.

Boston Limited was impeccable in their support from day one. They always did their best to provide anything the team needed as soon as possible. We even organised a training trip to their London headquarters a few weeks prior to the competition for hardware training as well as software support which, given their 20 years of experience in HPC, was truly invaluable. Boston also arranged for additional support from NVIDIA

The cluster used for the competition had 4 nodes, each incorporating 2 Intel Xeon E5 2680 v2 CPUs, 2 NVIDIA K40 GPUs and 64GB DDR-3 Registered ECC Memory and Intel 510 Series SSDs (7 in total). In terms of interconnect we used Mellanox 12-Port 40/56GbE.

The majority of cooling of the system was carried out by use of liquid cooling. Boston Limited was able to secure the CoolIT Rack DCLC AHx cooling system, which mounts directly onto the Intel Xeon E5-2680 v2 CPUs and NVIDIA K40 GPUs. This system allows both the processor and GPU accelerator heat output to be directly absorbed into circulating liquid, which then efficiently transports the heat to a liquid-to-air head exchanger mounted on the top of the rack. 

Our Linpack results... and how we did it

We had spectacular results on the Linpack benchmark. We achieved a score of 10.14TFLOP/s (or 3.38TFLOP/s per kW) with the system ranking at an estimated #4 on the Green500 list. It is the first time a team has broken the 10 TFLOP/s barrier within the competition’s 3kW power limit. In fact, the previous record, set at ASC just a few months ago by China’s Sun Yat-sen University was at 9.27TFLOP/s, also using NVIDIA K40 GPUs.

Our system was designed with the Linpack benchmark as a first priority, as the “Highest Linpack” award is the only one where results can be compared from year to year. We chose to incorporate NVIDIA K40 accelerators in our system, as the data we collected on benchmarks showed that they provide very high flops per watt. We decided that since the majority of computation would be taking place in the GPUs, we would eliminate as much overhead as possible, having an equal amount of CPUs and GPUs in the final configuration.

Secret weapon: liquid cooling

That power overhead was also significantly reduced thanks to Boston Limited’s success in securing our “secret weapon”: liquid cooling. By cooling the CPUs and GPUs using the CoolIT AHx we were able to remove multiple fans from the system. In the end, only 4 of the original 20 fans were left on each server. The heat exchange consumed only about 90W in total and this was further reduced by deactivating some of the heat exchange’s fans, after carefully investigating the feasibility of doing so without putting the system at risk. 

Last but not least, the record-setting score was accomplished by devoting many hours to testing. The Linpack benchmark has numerous parameters which can be used to tweak it for maximum performance on a wide variety of systems and architectures. We soon discovered that tweaking these parameters would not be sufficient however. In order to find the best configuration for our very specific requirements (our cluster design and power limits) we had to experiment with both the benchmark’s parameters as well as our hardware’s settings. By keeping detailed documentation of every test performed we were able to quickly adapt to changes in our hardware configuration and drain every flop possible out of every watt the system consumed.

In the end, we used all 80 of the system’s CPU cores as well as all 8 GPUs at their base clock of 745MHz. Using the binary for HPL provided by NVIDIA, the GPUs were already highly utilised and therefore higher clocks were not especially beneficial. 810MHz clocks provided higher performance but the cost in power consumption did not justify this change.

An experience of a lifetime

Participating in the ISC'14 Student Cluster Competition has been an experience of a lifetime. First of all, during the ISC exhibition we had the chance to meet some very important people from different companies and learn a lot about HPC innovations and future technologies. In addition to that, meeting other students is always interesting and fun, especially when they come from different parts of the world. We discussed with them various topics and learned a lot from them. The event's competitive nature helped push us to our limits. The friendly atmosphere, however, allowed us to enjoy the experience and get the most out of what the competition, ISC and Leipzig had to offer.

Apart from the actual event though, participating in the cluster competition was incredibly educational. We gained first-hand experience in setting up a system at all levels, from communicating with vendors, to choosing between architectures, to installing software packages and drivers, to implementing optimisations and much much more. It has been a learning experience, well worth all the hard work.

Last but not least, participation in the Student Cluster Competition was beautifully complimented by our course of studies at the University of Edinburgh and EPCC. The MSc programme in High Performance Computing has provided us with the knowhow and experience needed to understand HPC from all perspectives. With courses covering material on HPC architectures, programming models, performance programming and even trends in HPC, as well as the strong support and insight of everyone at EPCC, we were able to tackle optimisation and power efficiency from multiple angles, ultimately leading to our success at the competition. 

Authors

TeamEPCC: MSc in HPC students Emmanouil Farsarakis, Konstantinos Mouzakitis, Georgios Iniatis and Chenhui Quan; and EPCC mentor Xu Guo.