Posted: 24 May 2017 | 19:30
When we parallelise and optimise computational simulation codes we always have choices to make. Choices about the type of parallel model to use (distributed memory, shared memory, PGAS, single sided, etc), whether the algorithm used needs to be changed, what parallel functionality to use (loop parallelisation, blocking or non-blocking communications, collective or point-to-point messages, etc).
Posted: 11 Apr 2017 | 17:59
Shall I compare thee...
Performance comparisons are always tricky to get exactly right. They are needed to ensure that we can demonstrate the performance improvements that optimisations, new hardware, new algorithms, etc... have had on an application or benchmark, but there is a lot of latitude in what can be compared, which makes it easy to get a performance comparison wrong and not properly demonstrate whatever it is you're trying to show.
Posted: 2 Feb 2017 | 11:37
Fluidity for tidal modelling
Figure 1: Mesh for the Sound of Islay tidal simulation. Courtesy Dr Creech.
We were recently involved in a project to optimise the CFD modelling package Fluidity for tidal modelling. This ARCHER eCSE project was primarily carried out by Dr Angus Creech from the Institute of Energy Systems in Edinburgh.
Posted: 29 Jul 2016 | 16:45
Initial experiences on early KNL
Updated 1st August 2016 to add a sentence describing the MPI configurations of the benchmarks run.
Updated 30th August 2016 to add CASTEP performance numbers on Broadwell with some discussion
KNL is a many-core processor, successor to the KNC, that has up to 72 cores, each of which can run 4 threads, and 16 GB of high bandwidth memory stacked directly on to the chip.
Posted: 30 Jul 2015 | 14:40
Posted: 12 Jun 2015 | 15:41
MPI and vectorisation: Two ends of the optimisation spectrum
Day four of this week of intensive work optimising codes for Xeon Phi saw a range of work. The majority of the effort focussed on the vectorisation performance of CP2K and GS2; looking at the low level details of the computationally-intensive parts of these codes and seeing whether the compiler is producing vectorised codes, and if not is there anything that can be done to make the code vectorise.
Posted: 11 Jun 2015 | 16:01
Moving from OpenMP to vectorisation and MPI
Reality hit home a bit on the third day of our intensive week working with Colfax to optimise codes for the Xeon Phi.
After further implementation and analysis work it appears that the removal of the allocation and deallocation calls from some of the low level routines in CP2K will improve the OpenMP performance on Xeon and Xeon Phi, but only because there is an issue with the Intel compiler that is causing poor performance. The optimisation can see a reduction in runtime of around 20-30% for the OpenMP code, but only with versions 15 and 16 of the Intel compiler, on v14 there is a much smaller performance improvement.
Posted: 10 Jun 2015 | 00:08
Day 2: profiling and the start of optimising
After a first day spent getting codes set up and systems running, we got into the profiling of CP2K in anger today and have made some good progress.
Posted: 8 Jun 2015 | 17:48
Intel Parallel Computing Center collaboration with Colfax
We're just kicking off a week's collaboration with Colfax, a US technology company that collaborates heavily with Intel on Xeon Phi optimisation and training for the Xeon Phi.
Posted: 21 Nov 2014 | 10:29
EPCC's Grand Challenges Optimisation Centre, an Intel Parallel Computing Centre which we announced earlier in the year, has made significant progress over recent months.
The collaboration was created to optimise codes for Intel processors, particularly to port and optimise scientific simulation codes for Intel Xeon Phi co-processors. As EPCC also runs the ARCHER supercomputer, which contains a large number of Intel Xeon processors (although no accelerators or co-processors), for EPSRC and other UK research funding councils, we also have a strong focus on ensuring that scientific simulation codes are highly optimised for these processors. Therefore, the IPCC work at EPCC has been concentrating on improving the performance of a range of codes that are heavily used for computational simulation in the UK on both Intel Xeon and Intel Xeon Phi processors.