Performance

Under pressure

Author: Adrian Jackson
Posted: 23 Mar 2020 | 10:45

Squeezed performance

Memory under pressure

I was recently working with a colleague to investigate performance issues on a login node for one of our HPC systems. I should say upfront that looking at performance on a login node is generally not advisable, they are shared resources not optimised for performance.

We always tell our students not to run performance benchmarking on login nodes, because it's hard to ensure the results are reproducible. However, in this case we were just running a very small (serial) test program on the login node to ensure it worked before submitting it to the batch systems and my colleague noticed a performance variation across login nodes that was unusual.

Global or local - which is best?

Author: Adrian Jackson
Posted: 9 Oct 2019 | 17:30

Selfish performance

Sharing of resources has challenges for the performance and scaling of large parallel applications. In the NEXTGenIO project we have been focusing specifically on I/O and data management/storage costs, working from the realisation that current filesystems will struggle to efficiently load and store data from millions of processes or tasks all requesting different data sets or bits of information.

What is MPI “nonblocking” for? Correctness and performance

Author: Daniel Holmes
Posted: 27 Feb 2019 | 15:53

The MPI Standard states that nonblocking communication operations can be used to “improve performance… by overlapping communication with computation”. This is an important performance optimisation in many parallel programs, especially when scaling up to large systems with lots of inter-process communication.

However, nonblocking operations can also help with making a code correct – without introducing additional dependencies that can degrade performance.

Top500: Change or no change?

Author: Adrian Jackson
Posted: 12 Dec 2017 | 11:16

November 2017 Top500

My initial impression of the latest Top500 list, released last month at the SC17 conference in Denver, was that little has changed. This might not be the conclusion that many will have reached, and indeed we will come on to consider some big changes (or perceived big changes) that have been widely discussed, but looking at the Top 10 entries there has been little movement since the previous list (released in June).

Balancing act: optimise for scaling or efficiency?

Author: Adrian Jackson
Posted: 24 May 2017 | 19:30

When we parallelise and optimise computational simulation codes we always have choices to make. Choices about the type of parallel model to use (distributed memory, shared memory, PGAS, single sided, etc), whether the algorithm used needs to be changed, what parallel functionality to use (loop parallelisation, blocking or non-blocking communications, collective or point-to-point messages, etc).

ARCHER code developers and presenting performance

Author: Adrian Jackson
Posted: 11 May 2017 | 00:06

Application performance

As part of the ARCHER Knights Landing (KNL) processor testbed, we have produced and collected a set of benchmark reports on the performance of various scientific applications on the system. This has involved the ARCHER CSE team, EPCC's Intel Parallel Computing Center (IPCC) team, and various users of the system all benchmarking and documenting the performance they have experienced. 

Apple vs oranges: performance comparisons

Author: Adrian Jackson
Posted: 11 Apr 2017 | 17:59

Shall I compare thee...

Performance comparisons are always tricky to get exactly right. They are needed to ensure that we can demonstrate the performance improvements that optimisations, new hardware, new algorithms, etc... have had on an application or benchmark, but there is a lot of latitude in what can be compared, which makes it easy to get a performance comparison wrong and not properly demonstrate whatever it is you're trying to show.

The tyranny of 100x

Author: Adrian Jackson
Posted: 10 Mar 2017 | 15:39

Reporting Performance

Measuring performance is a key part of any code optimisation or parallelisation process.  Without knowing the baseline performance, and what has been achieved after the work, it's impossible to judge how successful any intervention has been.  However, it's something that we, as a community, get wrong all the time, at least when we present our results in papers, presentation, blog posts, etc...  I'm not suggesting that people aren't measuring performance correctly, or are deliberately falsifying performance improvements, but the incentives to make your work look as impressive as possible causes people to present results in a way that really isn't justified.

 

Early experiences with KNL

Author: Adrian Jackson
Posted: 29 Jul 2016 | 16:45

Initial experiences on early KNL

Updated 1st August 2016 to add a sentence describing the MPI configurations of the benchmarks run.
Updated 30th August 2016 to add CASTEP performance numbers on Broadwell with some discussion

EPCC was lucky enough to be allowed access to Intel's early KNL (Knights Landing, Intel's new Xeon Phi processor) cluster, through our IPCC project.  KNL Processor Die

KNL is a many-core processor, successor to the KNC, that has up to 72 cores, each of which can run 4 threads, and 16 GB of high bandwidth memory stacked directly on to the chip.

Latest Top500 list, looking beyond the number 1

Author: Adrian Jackson
Posted: 21 Jun 2016 | 17:13

There's been a lot of discussion about the latest Top500 list, released this week at ISC16.  Most of the interest has been in the whopping new Chinese system, Sunway TaihuLight, which has come in at number 1 on the list with a massive 93 PFlop/s rpeak Linpack performance, and 125 PFlop/s rmax theoretical peak performance (3 times bigger than the previous number 1 system).Top500

Whilst this is a very interesting system, and much bigger than is currently planned elsewhere, it's not unknown for very large systems to come in and dominate the list like this.  Back in 2002, the Japanese Earth Simulator system became the number 1 machine with an rpeak of ~5x that of the previous number 1 system, and it stayed as the top machine for a number of years.

Pages

Blog Archive