Balancing act: optimise for scaling or efficiency?

Author: Adrian Jackson
Posted: 24 May 2017 | 19:30

When we parallelise and optimise computational simulation codes we always have choices to make. Choices about the type of parallel model to use (distributed memory, shared memory, PGAS, single sided, etc), whether the algorithm used needs to be changed, what parallel functionality to use (loop parallelisation, blocking or non-blocking communications, collective or point-to-point messages, etc).

Apple vs oranges: performance comparisons

Author: Adrian Jackson
Posted: 11 Apr 2017 | 17:59

Shall I compare thee...

Performance comparisons are always tricky to get exactly right. They are needed to ensure that we can demonstrate the performance improvements that optimisations, new hardware, new algorithms, etc... have had on an application or benchmark, but there is a lot of latitude in what can be compared, which makes it easy to get a performance comparison wrong and not properly demonstrate whatever it is you're trying to show.

Optimised tidal modelling

Author: Adrian Jackson
Posted: 2 Feb 2017 | 11:37

Fluidity for tidal modelling

Tidal model

Figure 1: Mesh for the Sound of Islay tidal simulation. Courtesy Dr Creech.

We were recently involved in a project to optimise the CFD modelling package Fluidity for tidal modelling. This ARCHER eCSE project was primarily carried out by Dr Angus Creech from the Institute of Energy Systems in Edinburgh.

Early experiences with KNL

Author: Adrian Jackson
Posted: 29 Jul 2016 | 16:45

Initial experiences on early KNL

Updated 1st August 2016 to add a sentence describing the MPI configurations of the benchmarks run.
Updated 30th August 2016 to add CASTEP performance numbers on Broadwell with some discussion

EPCC was lucky enough to be allowed access to Intel's early KNL (Knights Landing, Intel's new Xeon Phi processor) cluster, through our IPCC project.  KNL Processor Die

KNL is a many-core processor, successor to the KNC, that has up to 72 cores, each of which can run 4 threads, and 16 GB of high bandwidth memory stacked directly on to the chip.

HPCG: benchmarking supercomputers

Author: Adrian Jackson
Posted: 30 Jul 2015 | 14:40


The LINPACK library (often known as HPL) has been used to benchmark large-scale computers for over 20 years, with the results being published in the Top500 list. But does it accurately reflect the performance of real applications?

Day 4 of IPCC-Colfax work at EPCC

Author: Adrian Jackson
Posted: 12 Jun 2015 | 15:41

MPI and vectorisation: Two ends of the optimisation spectrum

Day four of this week of intensive work optimising codes for Xeon Phi saw a range of work. The majority of the effort focussed on the vectorisation performance of CP2K and GS2; looking at the low level details of the computationally-intensive parts of these codes and seeing whether the compiler is producing vectorised codes, and if not is there anything that can be done to make the code vectorise.

Day 3 of optimising for the Xeon Phi, moving on to vectorisation

Author: Adrian Jackson
Posted: 11 Jun 2015 | 16:01

Moving from OpenMP to vectorisation and MPI

Reality hit home a bit on the third day of our intensive week working with Colfax to optimise codes for the Xeon Phi.

After further implementation and analysis work it appears that the removal of the allocation and deallocation calls from some of the low level routines in CP2K will improve the OpenMP performance on Xeon and Xeon Phi, but only because there is an issue with the Intel compiler that is causing poor performance. The optimisation can see a reduction in runtime of around 20-30% for the OpenMP code, but only with versions 15 and 16 of the Intel compiler, on v14 there is a much smaller performance improvement.

Second day of collaborating with Colfax

Author: Adrian Jackson
Posted: 10 Jun 2015 | 00:08

Day 2: profiling and the start of optimising

After a first day spent getting codes set up and systems running, we got into the profiling of CP2K in anger today and have made some good progress.

Working on the Xeon Phi

Author: Adrian Jackson
Posted: 8 Jun 2015 | 17:48

Intel Parallel Computing Center collaboration with Colfax

We're just kicking off a week's collaboration with Colfax, a US technology company that collaborates heavily with Intel on Xeon Phi optimisation and training for the Xeon Phi. 

Intel Parallel Computing Centre: progress report

Author: Adrian Jackson
Posted: 21 Nov 2014 | 10:29

EPCC's Grand Challenges Optimisation Centre, an Intel Parallel Computing Centre which we announced earlier in the year, has made significant progress over recent months. 

The collaboration was created to optimise codes for Intel processors, particularly to port and optimise scientific simulation codes for Intel Xeon Phi co-processors. As EPCC also runs the ARCHER supercomputer, which contains a large number of Intel Xeon processors (although no accelerators or co-processors), for EPSRC and other UK research funding councils, we also have a strong focus on ensuring that scientific simulation codes are highly optimised for these processors. Therefore, the IPCC work at EPCC has been concentrating on improving the performance of a range of codes that are heavily used for computational simulation in the UK on both Intel Xeon and Intel Xeon Phi processors.


Blog Archive