World-Class Data Infrastructure
The World-Class Data Infrastructure (WCDI), which is being developed by EPCC, will facilitate new products, services, and scientific studies by bringing together regional, national and international datasets. It can be thought of as a layer of storage and computing services presented as a private cloud and hosting a rich and growing collection of data. WCDI supports the long-term storage and curation of data assets, and their cataloguing, preparation and presentation as analytic-ready datasets for research and innovation. It offers a range of computing services, from web-based notebooks to rich desktop environments and seamless access to high-performance computing.
WCDI supports learners, researchers and innovators across the spectrum, with services from basic data download through simple learn-as-you-play-with-data notebooks to GPU-enabled machine-learning platforms for driving AI application development.
Safe haven services
WCDI also provides safe haven services to health and government users, following best practice in independent governance and supporting the linkage of complex personal data for public benefit research and policy-making under national and regional safeguards. Safe haven services can be created for organisations wishing to host and govern access to their data assets in a highly secure environment. Safe havens are isolated from the rest of WCDI, with user approvals, data ingress and egress, and permitted software all controlled by information governance bodies independent of the infrastructure itself.
WCDI will grow and mature with the DDI Programme, expanding in capacity and capability, responding to the needs of the innovation Hubs and, through them, to learners, researchers innovators and entrepreneurs from across the region and beyond.
WCDI Phase 1 (2019-2020)
WCDI Phase 1 focuses on development and co-design. Working with a number of key stakeholders and early adopters, EPCC is putting together the core elements of WCDI, including building its new home at our Advanced Computing Facility (ACF).
Phase 1: facility. WCDI will be housed in Computer Room 4 at the ACF, which is due for completion June 2020. This will be a high-resilience computer room with enhanced power and network.
Phase 1: storage. Currently with a 10 PB disk capacity split approximately equally between the safe haven and non-safe haven sides, this will rise to 12 PB by July 2020.
Phase 1: compute. A small slice of cloud with around 30 virtual machines plus four NVIDIA V100 GPUs and access to the Cirrus HPC service.
Phase 1: early adopters. As well as internal development work, we are working with a number of stakeholders to help shape the service:
- The iCAIRD digital pathology archive
- The National Collection of Aerial Photography
- Health Data Research-UK
- The Administrative Data Research Partnership
- The Paracrawl Internet Archive project
- Albyn Housing Society
- The NHS Scotland National Safe Haven
- Scottish Government
- Regional Local Authorities
- The Festival Fringe
- Global Open Finance Centre of Excellence
- CityScope open data learning environment
- The Data Loch
Data catalogue (Q1 2020)
- An open metadata repository
- Easy access to open data
- An approvals system supporting access to restricted data
Browser-based “notebook” services (Q1 2020)
Desktop virtual machine services (Q2 2020)
- Data analysis, for statistical analysis with R, Python etc.
- Data science, for machine learning and data modelling
- Both with and without attached GPU capability
- Data engineering for data flow software builders, with Spark, Scala, Kafka etc.
SAS Viya service (Q2 2020)
- Browser-based access to the Bayes Centre’s SAS platform