EPCC works with clients from across industry – from finance, telecoms, construction, biotech, engineering – and with academic collaborators in disciplines spanning economics to life-sciences, to provide insights into data. We use a range of methods, from tried-and-trusted data mining to modern machine learning; in Python, R or Scala, in Java, C or Fortran; on plain old computers, on massive HPC systems or in secure data safe havens. Our interest is seldom in the technology, always in solving the problem.
Data science and data analytics are, at bottom, just about good scientific method. Data science brings new computational tools to bear on good old fashioned problems of “messy” real-world data – uncertain measurements, missing values, systematic and statistical errors. EPCC’s staff have an eclectic mix of skills and talents from both computer and natural sciences, and consequently a very good understanding of the complex, uncertain – and yes, “messy” – nature of real data. We recognise the general benchmarks that suggest that 80-90% of the time spent on a data analytics project goes into cleaning and wrangling data into the right form.
High-performance data analytics
When big data arrive as many millions of small, independent objects then map/reduce on Hadoop is a great way to analyse them. But when “big data” involves massive, highly correlated data sets – systems biological pathways, climate forecasting, engineering simulation – then Hadoop just isn’t enough. EPCC are experts at bringing the power of massively parallel high-performance computers to bear on complex data analytics tasks.
Case study: MONC
Secure data environments
A lot of our work involves “sensitive” data – medical records, for example or financial customer data – and we have developed significant expertise in the construction, operation and use of secure data safe havens like the National Safe Haven at the Farr Institute Scotland. Safe havens have heavily controlled access, restrictions on data ingress and egress, and locked-down virtual desktop environments, ensuring that researchers are able to conduct vital public health research, for instance, on sensitive data while maintaining the safeguards rightly demanded by the public. EPCC is a trusted linkage agent for the National Safe Haven, authorised by the NHS to connect de-identified sensitive datasets together for complex research.
Case study: Alzheimer Scotland Dementia Research Centre