ARCHER: the next national HPC service for academic research

Author: Andrew Turner
Posted: 29 Nov 2013 | 11:00

ARCHER (Advanced Research Computing High End Resource) is the next national HPC service for academic research. The service comprises a number of components: accommodation provided by the University of Edinburgh; hardware by Cray; systems support by EPCC and Daresbury Laboratory; and user and computational science and engineering support by EPCC.

In Autumn 2011, the Minister for Science announced a new capital investment in e-infrastructure, which included £43m for ARCHER, the next national HPC facility for academic research. After a brief overlap, ARCHER will take over from HECToR as the UK’s primary Academic research supercomputer. HECToR has been in Edinburgh since 2007.

What is ARCHER?

The new Cray XC30 architecture is the latest development in Cray’s long history of MPP architectures, which have been supporting fundamental global scientific research for over two decades.

The Cray XC30 incorporates two major upgrades to the fundamental components of any MPP supercomputer: the introduction of Cray’s newest network interconnect, Aries; and the use of Intel’s latest Xeon series of Ivy Bridge multi-core processors. Each has enhanced capabilities over previous architectures. Aries incorporates the novel dragonfly network topology that provides multi-tier all-to-all connectivity. This new network allows all applications, even those that perform all-to-all style communications, the potential to scale to the full size of the system allowing scientists the capability to tackle problems that might have been considered impossible on previous systems.

The latest Intel Xeon Ivy Bridge processors used in ARCHER provide the next generation of computational muscle, with best-in-class floating-point performance, memory bandwidth and energy efficiency. Each ARCHER node comprises two 12-core 2.7 GHz Ivy Bridge multi-core processors, at least 64 GB of DDR3-1833 MHz main memory and all compue nodes are interconnected via an Aries Network Interface Card. ARCHER has 3008 such nodes, ie, 72,192 cores, in only 16 cabinets providing a total of 1.56 Petaflops of theoretical peak performance.Scratch storage is provided by 20 Cray Sonexion Scalable Storage Units, giving 4.4PB of accessible space with sustained read-write bandwidth of over 100GB per second.

ARCHER is also directly connected to the UK Research Data Facility, easing the transition of large data sets between high-performance scratch space and long-term archival storage and between successive HPC services.

Updates included in the newest versions of the Cray Compilation Environment provide full support for generating highly optimised executables that fully exploit the “Ivy Bridge” processors. Users will also have access to the latest Intel Composer Suite of compilers, and the industry standard GNU Compiler Collection, all of which are fully integrated with the feature-rich Cray Programming Environment suite that is familiar to existing HECToR users.

Advanced Computing Facility: building for the future

ARCHER’s Accommodation and Management function is provided by the University of Edinburgh. ARCHER is housed at the University’s Advanced Computing Facility (ACF). The University has a long-term commitment to ensure the ACF is capable of hosting top-end facilities and deliver excellent levels of energy efficiency.

In readiness for ARCHER, the ACF was extended, with the addition of 500m2 of Computer Room floor space, and an additional 760m2 plant room to contain the additional electrical and mechanical infrastructure. This included a new high-efficiency 4MW-capacity cooling system and an upgrade to the site’s private high-voltage network that increases the capacity to around 7MW.

The project to extend the facility commenced in May 2012, with the building ready for the installation of plant in September (despite the wettest summer on record). The HV switch-room was commissioned in November 2012 and the full capability of the plant commissioned by year end.

The facility was fit for purpose two months ahead of schedule and was delivered in excess of specification while under budget. The Cray XC30 and associated storage systems were delivered in September 2013. The installation went very smoothly, with all power and cooling connections made and the system powered up within a few days.

From HECToR to ARCHER

The acceptance tests of the ARCHER hardware were successfully completed in late October. Usage of ARCHER ramped up in mid-November, with core research consortia online since November 13th. Remaining grant holders will begin to transfer from HECToR to ARCHER in December 2013, with the HECToR Service ceasing in March 2014.

The ARCHER service

The Service Provision function for ARCHER is provided by UoE HPCX Ltd. This includes responsibilities such as systems administration; helpdesk and website provision; and administrative support. The work is subcontracted to EPCC at the University of Edinburgh (EPCC) and the STFC’s Daresbury Laboratory.

Service Provision will be delivered by two sub-teams: the Operations and Systems Group led by Mr Michael Brown, and the User Support and Liaison Team led by Dr Alan Simpson.

Enabling a smooth transition for users from the HECToR to ARCHER services is one of our key aims. For ARCHER, we will utilise SAFE (Service Administration from EPCC) for both the ARCHER Helpdesk and Project Administration & Reporting functions.

The ARCHER website provided by EPCC contains supporting documentation for the service and will also showcase the research that uses the system. The configuration of the ARCHER service will evolve over time to stay in line with users’ needs. Continual Service Improvement will be a key goal, and as such the service will be delivered following the ITIL Best Practice Framework.

Computational Science and Engineering (CSE) Support

Computational Science and Engineering (CSE) support on ARCHER is provided by EPCC and includes responsibility for helping users with porting, optimising and developing their codes for ARCHER, ensuring that the correct scientific software is installed on the system to meet user requirements, providing advice and best practices to enable users to exploit ARCHER resources, and training and developing scientific software development expertise in the UK research communtity.

Our goal for the CSE support is to be as open and inclusive as possible; allowing ARCHER users to draw on the full wealth of expertise available in the UK HPC and computational science community.  We will use a mix of established, successful activities and innovative ideas to realise this goal.

Embedded CSE programme

The Embedded CSE (eCSE) programme expands and refines the successful HECToR dCSE programme to allow software development expertise to be placed in academic research groups where it can provide the most benefit and have the greatest impact. The first eCSE call has already opened (deadline: 14 January, 4pm). Details can be found on the ARCHER website.

The in-depth CSE support will be fully integrated into the SAFE Helpdesk, providing a seamless service to users that gives direct access to ARCHER expertise and a rapid response to any queries.

Technical Forum

The ARCHER Technical Forum is open to all users (and external people who are interested in technical discussion around HPC). The Forum consists of a series of monthly meetings conducted using webinar technology with a wide range of technical experts invited to speak and attend, and a public mailing list for technical discussion.

Consortium Contacts

We have established a set of Consortium Contacts: HPC experts who will provide a direct link between  the research communities using ARCHER and the service itself. These Contacts will allow the research communities to use ARCHER more effectively, have a role in driving the development of the service to meet their needs, and have a simple way to provide feedback to the CSE support team and the service in geenral.

Training

Training will be provided at locations all over the UK through links with the HPC-SIG, HPC Wales, and the STFC's Daresbury Laboratory. We are already consulting people around the UK about the training requirements of different research communities. The first ARCHER course was recently run successfully in Edinburgh and webinar technology allowed anyone in the world to attend, even if they could not travel to the physical location. The lectures from the course have been recorded and will be made available publically on the ARCHER website in the near future.

Get in touch!

We welcome ideas and opportunities for enhancing the CSE support. If you want to be involved or have any thoughts, please do not hesitate to get in touch via the ARCHER Helpdesk.

EPSRC is the managing agent for the HPC facility on behalf of all of the Research Councils.

Access

The first EPSRC call for access to ARCHER has already opened, and details can be found on both the EPSRC and ARCHER websites.

The first call for eCSE projects has opened, details can be found on the ARCHER website.

Images (top to bottom): ARCHER’s water cooling system; Aries Interconnect; new high-voltage Switch Room; ARCHER cabinet artwork.

Author

This post was jointly written by Tom Edwards (Cray) and Mike Brown, Liz Sim, Alan Simpson and Andy Turner (all EPCC).

Add new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.