EPCC’s Advanced Computing Facility: how it all began
Posted: 29 Jan 2021 | 11:03
EPCC's Advanced Computing Facility (ACF) was born through necessity – but the opportunity was taken to create an expandable and highly-efficient facility that attracted future business including national computing services. Mike Brown, EPCC’s Director of HPC Operations until 2017, gives an overview of the development of this unique building.
In 2000 all of the large-scale systems operated by EPCC (Cray T3E plus a number of SUN SPARC-based SMP systems) were installed in a computer room on the University of Edinburgh’s Kings Buildings campus.
After a series of thefts across the UK of SUN CPU and memory parts, it became clear that a radical re-think of computer room provision for HPC-class systems was required – especially as we were planning to undertake a series of significant acquisitions such as the new system for UKQCD (which became QCDOC), the University's first-generation Storage Area Network (SAN), and the future IBM Blue Gene system. We were also starting to position ourselves for the next EPSRC-funded UK national service (which became HECToR).
The existing site could no longer offer the likely space, power, cooling, or the enhanced security and fire-suppression requirements. New-build was out of the question but fortunately the University's 1970s-era former computer centre was available.
Phase 1: Computer Room 2
Funding for refurbishment of the building was secured in early 2003, with work completed in August 2004. The building had one 285m² computer room (CR2) fitted out, a second (CR1) empty in reserve, and a 250m² plant room for mechanical and electrical services. CR2 was fitted out principally for air-cooling, but with dedicated cooling water supplies for QCDOC (installed by the end of 2004). By year end about 50% of the space was occupied – with the SUN SAN, QCDOC, SUN 15K SMP system and the dedicated storage systems for QCDOC, while the Blue Gene/L system arrived shortly afterwards.
The facility was officially opened by the Duke of Edinburgh in the summer of 2005.
Phase 2: Computer Room 1
Occupancy continued to increase, and by 2006 planning for the potential arrival of the EPSRC-funded HECToR national service was underway. Installation at Edinburgh was confirmed in November and we then had a very short period to undertake a significant expansion of the facility.
Due to its physical size, and power and cooling requirements, the initial phase 1 system (60-cabinet Cray XT4) required the fit-out and commissioning of the reserve CR1 – and the construction of the 475m² Plant Room B to service it. All infrastructure was commissioned in July 2007 – just six months after the ground was cut in January.
In August the Cray XT4 was installed in CR1 while most of the peripheral equipment (front-ends and support nodes, discs plus a stand-alone XT4 test and development system) were installed in CR2.
The HECToR service was opened by the then Chancellor of the Exchequer, Alistair Darling, in January 2009.
The HECToR service went through two principal upgrades: the first in Q2/2010, which reduced the XT4 to a 30-cabinet system while 20×XE6 liquid-cooled cabinets were installed alongside, and the second in Q4/2011 when the XT4s were removed and the XE6 expanded to 30 cabinets. The installation of the fully liquid-cooled system enabled the plant to exceed its planned levels of operating efficiency.
The University of Edinburgh’s Information Services began to install some of its large-scale systems (the ECDF cluster and follow-on generations of the original SAN) at the ACF, and by 2009 CR2 was full.
Phase 3: Computer Room 3
The second major expansion occurred in 2012 after the ACF had been selected to house the EPSRC ARCHER service. Groundworks started in April 2012 for an expansion that included the 550m² Computer Room 3 and 750m² Plant Room C.
Despite one of the wettest summers on record, the build was completed ahead of schedule with full completion in February 2013.
The Cray XE30 was not installed until Autumn 2013, but the plant and systems were functional from handover in February, with the first installation (the moving of the RDF (Research Data Facility and associated hardware from CR2) in the spring.
The ACF is in a pleasant rural location, which can present operational challenges in winter. The severe winter of 2009 brought a record low temperature of -14°C, while in 2010 more than 450mm of snow fell in the carpark. However operations were maintained throughout, with the site only closed early for bad weather twice – once for snow and once when 90mph+ winds caused widespread damage in the area, including blocking the access road with fallen trees.
The building is not an architectural gem – it has been likened to a concrete bunker in the middle of a sheep-field but it is functional, secure, efficient and has proven to have been a highly cost-effective investment by the University and the Research Councils.
Mike Brown was Director of HPC Operations until he left the University in 2017. Associated with the support and delivery of HPC services at the University (from the ICL DAP in 1982 until the Cray XC30) as a Chartered Engineer, his principal interest was in the provision and operation of the support infrastructure, with an emphasis in maximising operational efficiency.
Images from top: IBM Blue Gene L; Linda Dewar (now EPCC's HPC Systems Programme Manager) with QCDOC; HECToR Phase 1.