Hosting and operating the ARCHER2 service

27 January 2023

Many of the articles about ARCHER2 contain a statement such as “ARCHER2 is hosted and operated by EPCC at the University of Edinburgh”. But what does this actually mean?

ARCHER2 is a world-class advanced computing resource for UK researchers. It is an HPE Cray EX supercomputing system with an estimated peak performance of 28 Pflop/s. This performance is utilised by academics from across the UK (and beyond) to perform world-class science, with wide ranging benefits to society and the economy. The service has over 2700 active user accounts and regularly sits at around 85-90% utilisation. 

Hosting ARCHER2

The system is hosted at the Advanced Computing Facility, our world-class data centre. ARCHER2 is housed securely at the site, with the team ensuring that the appropriate power, cooling, and monitoring are in place for it to run efficiently.

A recent blogpost ("Ensuring continuity of service at the ACF") by my colleague Calum Muir is an interesting read, giving insight into the facilities and activities that are required to host this sort of system. For example, ARCHER2 is hosted in our state-of-the-art Computer Room 3, which has recently undergone an upgrade of the main power distribution units’ (PDUs) supply cabling and sub-floor power supply cables to ensure that the electrical infrastructure has additional resilience when ARCHER2 is at maximum capability.

Service and support provision

In addition to hosting the ARCHER2 service, EPCC is contracted to provide the Service Provision (SP) and Computation Science and Engineering (CSE) support.

The SP team is responsible for maintaining the system, for example ensuring software upgrades are completed, also security management, job management customisation, monitoring compute and storage availability, and job monitoring. This group also runs the Service Desk, offering front-line support to all our users. Developing and maintaining the ARCHER2 website and documentation completes the roles of the SP team. This coordinates interactions between different parts of the service, for example working closely with our colleagues at HPE. 

The CSE team works more closely with our user community, providing in-depth support. For example we work with users to help port codes, debug problems, and look at optimisations. We run an extensive training programme for users and the wider UK community, running both online and in-person courses. We look to provide the best possible user environment, providing tools and best practice guides. The CSE team also coordinates the “embedded Computational Science and Engineering” (eCSE) programme, which allocates funding to Research Software Engineers across the UK, funding staff embedded in user communities working to enhance the application software on ARCHER2.

Outreach

Outward facing, we engage with different science communities to share knowledge, research, and best practice. We also enjoy delivering activities to school children and the general public, providing hands-on activities at science festivals, and demonstrating the societal benefit of supercomputing and the opportunities available to young people considering a career in computational science. 

So all in all, the phrase “hosted and operated” covers a multitude of activities, from electricity supplies, through software management, queries, training, and outreach to schools. It all makes for an interesting and varied job! 

ARCHER2 supercomputer with bright graphics along the cabinet doors

Author

Dr Lorna Smith
Lorna