Posted: 23 Mar 2020 | 10:45
I was recently working with a colleague to investigate performance issues on a login node for one of our HPC systems. I should say upfront that looking at performance on a login node is generally not advisable, they are shared resources not optimised for performance.
We always tell our students not to run performance benchmarking on login nodes, because it's hard to ensure the results are reproducible. However, in this case we were just running a very small (serial) test program on the login node to ensure it worked before submitting it to the batch systems and my colleague noticed a performance variation across login nodes that was unusual.
Posted: 22 Nov 2019 | 12:10
Developed by EPCC, the Edinburgh International Data Facility (EIDF) will facilitate new products, services, and research by bringing together regional, national and international datasets.
Posted: 7 Nov 2019 | 14:55
After four years of hard work, the NEXTGenIO project has now come to an end. It has been an extremely enjoyable and successful collaboration with a dedicated group of HPC users, software and tools developers, and hardware providers from across Europe.
Posted: 30 Oct 2019 | 12:48
Blog post updated 8th November 2019 to add Figure 6 highlighting PMDK vs fsdax performance for a range of node counts.
Following on from the recent blog post on our initial performance experiences when using byte-addressable persistent memory (B-APM) in the form of Intel's Optane DCPMM memory modules for data storage and access within compute nodes, we have been exploring performance and programming such memory beyond simple filesystem functionality.
For our previous performance results we used what is known as a fsdax (Filesystem Direct Access) filesystem, which enables bypassing the operating system (O/S) page cache and associated extra memory copies for I/O operations. We were using an ext4 filesystem on fsdax, although ext2 and xfs filesystems are also supported.
Posted: 9 Oct 2019 | 17:30
Sharing of resources has challenges for the performance and scaling of large parallel applications. In the NEXTGenIO project we have been focusing specifically on I/O and data management/storage costs, working from the realisation that current filesystems will struggle to efficiently load and store data from millions of processes or tasks all requesting different data sets or bits of information.
Posted: 17 Jul 2019 | 14:11
As part of the NEXTGenIO project we have a prototype HPC system that has two Intel Omni-Path networks attached to each node. The aim of having a dual-rail network setup for that system is to investigate the performance and functionality benefits of having separate networks for MPI communications and for I/O storage communications, either directing Lustre traffic and MPI traffic over separate networks, or using a separate network to access NVDIMMs over RDMA. We were also interested in the performance benefits for general applications exploiting multiple networks for MPI traffic, if and where possible.
Posted: 5 Jul 2019 | 11:13
The EU VESTEC research project is focused on the use of HPC for urgent decision-making and the project team will be running a workshop at SC’19.
VESTEC will build a flexible toolchain to combine multiple data sources, efficiently extract essential features, enable flexible scheduling and interactive supercomputing, and realise 3D visualisation environments for interactive explorations.
Posted: 6 Jun 2019 | 14:34
The highly successful NEXTGenIO project is now drawing to a close after nearly four years. EPCC colleagues will be at ISC19 presenting the results of the project at a booth presentation, a BoF, and a workshop presentation. Come along and find out more!
Posted: 8 Jan 2019 | 15:08
Earlier this year, HPE announced the Catalyst UK programme: a collaboration with Arm, SUSE and three UK universities to deploy one of the largest Arm-based high performance computing (HPC) installations in the world. EPCC was chosen as the site for one of these systems; the other two are the Universities of Bristol and Leicester.
EPCC's system (called 'Fulhame' after pioneering chemist Elizabeth Fulhame) was delivered and installed in early December. This HPE Apollo 70-based system consists of 64 compute nodes with two 32-core Cavium ThunderX2 processors (ie 4096 cores in total), 128GB of memory composed of 16 DDR4 DIMMs, and Mellanox InfiniBand interconnects. It will be made available to both industry and academia, with the aim to build applications that drive economic growth and productivity as outlined in the UK government’s Industrial Strategy.
Posted: 24 Oct 2018 | 16:48
Supercomputers are getting more complex. Faster components would be impossible to cool but, by doing more with less, we can still solve bigger problems faster than ever before.