Protecting research data

Author: Anne Whiting
Posted: 14 Nov 2019 | 11:38

We are delighted to announce our success in passing and retaining ISO 9001 quality certification and ISO 27001information security certification for the delivery of National HPC Services and Data Services.

EPCC delivers National HPC Services including ARCHER, the UK National Science Supercomputer funded by UKRI, which is used by more than 5000 researchers from across the UK and beyond. It also operates an increasingly high-profile portfolio of data services, including the National Safe Haven on behalf of NHS Scotland. This will expand further with the launch of the World-Class Data Infrastructure in 2020. EPCC uses best practice to ensure that the data for which it is responsible is processed, managed and stored securely for the benefit of local, national and international researchers. ISO certification is awarded on an annual basis subject to a satisfactory and through external audit. EPCC has held ISO 9001 certification since 2017 and ISO 27001 since 2018.

As a part of ensuring the resilience of the ARCHER service and to identify areas of improvement, EPCC carries out a business continuity and disaster recovery test every two years.  The first such test was carried out two years ago, when the service was subjected to the test scenario of an office fire which closed down all office space at JCMB, where EPCC was then housed. The timing and nature of the test was only known to the team who organised the test.

On the day, staff running the ARCHER service, including the helpdesk with its associated service level requirements for user response, were informed that the building was on fire and had to evacuate the building whilst continuing to run the service. Notes were kept during the exercise and a "lessons learned" review carried out to identify improvements, which were then implemented. The success of this approach could be seen during the following genuine red weather warning of snow that closed the University of Edinburgh for several days. During this period the ARCHER service ran without interruption, using the processes that had been improved after the resilience test.

In October this year a second test was carried out, again planned by an independent test team and with staff unaware of the timing or the nature of the test to be used.  On the day, selected staff were informed that they were sick and must leave the office. It became apparent as the scenario unfolded that these staff had attended a catered party and subsequently come down with food poisoning.  Staff based at the Bayes Centre and the ACF Datacentre were involved.  The service was kept running uninterrupted using processes that define chains of command and operations during major incidents. A "lessons learned" review has again been carried out and the identified improvements are being implemented. This approach of testing our readiness in case of major incidents will be continued to ensure that EPCC is as prepared as possible to meet future challenges and to continue to provide services to our user communities under testing circumstances.

Author

Anne Whiting, EPCC