ARCHER2: Challenges for an RSE team supporting the UK national supercomputing service

9 February 2023

The ARCHER2 service is designed to enable world-leading research for over 4,000 users, which covers a wide range of research areas and scientific software. To assist these users, the ARCHER2 CSE service provides comprehensive support from expert research software engineers at EPCC. 

A typical research software engineer's (RSE) role involves maintaining a scientific software portfolio, improving documentation, and undertaking service-improvement activities. However at EPCC it also includes rapid responses to user enquiries and service incidents to ensure ARCHER2 users can continue to work. This can often prove disruptive to CSE team members and can interrupt the more rewarding and fulfilling aspects of an RSE’s role in contributing to improvements and preparations for the long-term success of ARCHER2. So getting the balance right between the RSE’s reactive and proactive tasks, and the mix of unpredictable fluctuations in workload against continuous quality improvement, is challenging to manage! 

The challenge is best met by estimating the time required for reactive work and scheduling additional time to account for fluctuations and core planned/proactive work. Individual reactive tasks can be distributed evenly amongst RSEs via a round-robin type of list, however this approach assumes a uniformity of tasks and effort and can lead to a workload imbalance across the team. Nor does it take advantage of individual expertise or promote knowledge sharing, thus we risk falling into constantly re-inventing the wheel! 

To address these issues, the ARCHER2 CSE service team is currently trialling a new pilot system where the service is restructured into three teams that each deliver on one aspect of the CSE service: query handling, service improvement, and training. We hope this system will better address the challenges associated with a mixed workload as well as giving staff an opportunity to contribute to different tasks within the CSE service by rotating the teams. Each team has a range of expertise, experience, and interests as well as an accountable function lead who will oversee progress and assist with escalation. The query handling team now has a shared responsibility for all queries; working as a team to find a solution rather than as individuals and promoting the sharing of knowledge and expertise within a team. Furthermore, with a dedicated query handling team, other teams can focus on implementing training and service improvements with minimal reactive interrupt-driven tasks. 

We are continually working to improve our service and look forward to the results of this pilot to better understand the balance of our tasks and to help provide the best support possible.   

Further reading

This article is based on a talk given at SC'22 written by EPCC's George Beckett, Eleanor BroadwayWilliam Lucas, and Andy Turner. You can read about the ARCHER2 CSE team’s time at SC’22 on the ARCHER2 website: ARCHER2 CSE team at SC22.

ARCHER2 is hosted and managed by EPCC. This article explains what is involved: Hosting and operating the ARCHER2 service.

Author

Eleanor Broadway
Eleanor Broadway