Xeon Phi

The Intel Parallel Computing Centre at EPCC

Author: Adrian Jackson
Posted: 15 Jun 2017 | 13:41

We are entering the fourth year of the Intel Parallel Computing Centre (IPCC). This collaboration on code porting and optimisation has focussed on improving the performance of scientific applications on Intel hardware, specifically its Xeon and Xeon Phi processors.  

ARCHER code developers and presenting performance

Author: Adrian Jackson
Posted: 11 May 2017 | 00:06

Application performance

As part of the ARCHER Knights Landing (KNL) processor testbed, we have produced and collected a set of benchmark reports on the performance of various scientific applications on the system. This has involved the ARCHER CSE team, EPCC's Intel Parallel Computing Center (IPCC) team, and various users of the system all benchmarking and documenting the performance they have experienced. 

Spreading the love

Author: Adrian Jackson
Posted: 10 Mar 2017 | 13:54

Binding processes to coresThread and process binding

Note, this post was updated on the 23rd March 2017 to include how to bind threads correctly on Cray systems (aprun -cc rather than taskset)

Making sure threads and processes are correctly placed, or bound, on cores or processors is essential to ensure good performance for a range of parallel applications. 

This is not a new topic, and has been covered well by others before, ie http://www.glennklockwood.com/hpc-howtos/process-affinity.html. Generally this is just handled for you; if you're running an MPI program then your mpirun/mpiexec/aprun job launcher will do sensible process binding to cores. 

MPI performance on KNL

Author: Adrian Jackson
Posted: 30 Aug 2016 | 12:22

Knights Landing MPI performance

Following on from our recent post on early experiences with KNL performance, we have been looking at MPI performance on Intel's latest many-core processor.

MPI ping-pong latency on KNC and IvyBridge
Figure 1

The MPI performance on the first generation of Xeon Phi processor (KNC) was one of the reasons that some of the applications we ported to KNC had poor performance.  Figures 1 and 2 show the latency and bandwidth of an MPI ping-pong benchmark running on a single KNC and on a 2x8-core IvyBridge node.

ParCo Symposium on Xeon Phi experiences

Author: Adrian Jackson
Posted: 20 Jul 2015 | 17:12

ParCo Symposium

Experiences of porting and optimising code for Xeon Phi processors

EPCC is jointly organising a symposium at the ParCo conference on experiences from those working on porting and optimising codes for this architecture about the challenges and successes they have experienced when working with the Xeon Phi, and how these also apply to standard parallel computing hardware.

Next Generation Computational Modelling Summer School

Author: Adrian Jackson
Posted: 15 Jul 2015 | 15:06

Discussions on computing

Myself and Fiona ReidNGCM - Next Generation Computational Modelling recently presented a 2-day course on porting and optimising for the Xeon Phi at the NGCM (Next Generation Computational Modelling) summer academy in Southampton. 

This one-week academy is designed to give PhD students some of the skills they need to undertake the range of computational simulations and data analysis tasks that their work requires.

Day 5 - Wrapping up the week

Author: Adrian Jackson
Posted: 21 Jun 2015 | 20:02

The final analysis and future plans

A week ago we finished our 5 days of intensive work optimising CP2K (and to a lesser extent GS2) for Xeon Phi processors. As discussed in previous blog posts (Day4, Day3, Day2, Day1), this was done in conjunction with research engineers from Colfax, and built on the previous year's work on these codes by EPCC staff through the Intel-funded IPCC project.

Day 4 of IPCC-Colfax work at EPCC

Author: Adrian Jackson
Posted: 12 Jun 2015 | 15:41

MPI and vectorisation: Two ends of the optimisation spectrum

Day four of this week of intensive work optimising codes for Xeon Phi saw a range of work. The majority of the effort focussed on the vectorisation performance of CP2K and GS2; looking at the low level details of the computationally-intensive parts of these codes and seeing whether the compiler is producing vectorised codes, and if not is there anything that can be done to make the code vectorise.

Day 3 of optimising for the Xeon Phi, moving on to vectorisation

Author: Adrian Jackson
Posted: 11 Jun 2015 | 16:01

Moving from OpenMP to vectorisation and MPI

Reality hit home a bit on the third day of our intensive week working with Colfax to optimise codes for the Xeon Phi.

After further implementation and analysis work it appears that the removal of the allocation and deallocation calls from some of the low level routines in CP2K will improve the OpenMP performance on Xeon and Xeon Phi, but only because there is an issue with the Intel compiler that is causing poor performance. The optimisation can see a reduction in runtime of around 20-30% for the OpenMP code, but only with versions 15 and 16 of the Intel compiler, on v14 there is a much smaller performance improvement.

Second day of collaborating with Colfax

Author: Adrian Jackson
Posted: 10 Jun 2015 | 00:08

Day 2: profiling and the start of optimising

After a first day spent getting codes set up and systems running, we got into the profiling of CP2K in anger today and have made some good progress.

Pages

Blog Archive