Enhancing the efficiency of the DataLoch data service
4 March 2026
January 2026 marked the latest roll out of enhancements to DataLoch: NHS Lothian and the University of Edinburgh's health and social care data service. EPCC's Ally Hume, who has worked on the service for over five years as its Principal Technical and Data Architect, leads on some of the challenging and exciting behind-the-scenes work that improves service-delivery efficiency and expands the data offering.
In over two decades at EPCC, I have worked on many data-related projects including the Scottish Medical Imaging data repository, mathematical modelling of plant biology, distributed data integration systems, and financial data mining. I started working on DataLoch in 2019. When the pandemic arrived in early 2020, the DataLoch team pivoted from work on the planned initial data offering to implementing a Covid-19-centric data offering to address the urgent need. This offering included a simple checkbox-style data selection form that could be automatically processed to extract the relevant data for any research project.
Evolving research requirements
Post-pandemic, DataLoch started to add more datasets – particularly general practice and many additional secondary care datasets covering areas such as cardiology and critical care. With the addition of more datasets, researchers’ requirements became increasingly complex. As a result, DataLoch's analysts began writing bespoke database query code to extract the absolute minimal data necessary for each project and the use of automation decreased.
As automation decreased and the effort required to deliver data for each project increased, it became clear that this level of effort would not be sustainable. In addition, the bespoke nature of each project led to inconsistencies in data style and an increased number of time-consuming amendments to the data extracts make available for each project.
A new data management solution
Clearly, data extraction needed to be more standardised and automatically executed. I believe how this was achieved is an interesting study of how procedures can be changed while continuing to offer a service. Although my colleagues and I had the broad idea of a big architecture, the real breakthrough came from small teams of one or two people developing prototypes of improved ways of working and using these prototypes to inspire change and challenge conventions.
We prototyped a metadata management solution and showed how it could automatically generate the DataLoch metadata catalogue documentation. We also prototyped a cohort-generation tool, which in turn inspired a much-simplified view of cohort specification that eventually rendered the need for any such tool virtually redundant.
Working with governance experts, we prototyped simplified, default views of our data that would be proportionate for most research studies, offering bespoke data for those with a justifiable need. Finally, we prototyped automated data extraction that took advantage of this metadata solution, simplified project data specifications, and default data views to provide a solution to automatically extract and pseudoanonymise the data.
Agile improvement
This highly agile method of improving DataLoch's processes has worked very well, and the January roll out of DataLoch’s enhancements sees the simplified data specification being adopted for the first time. Data requirements specified in this way will be extracted automatically, efficiently and consistently. It is expected that this improved approach will cut multiple days’ effort from producing every DataLoch data extract. This will reduce the time taken to deliver data to the secure environment for researchers.
Additionally, the improved consistency and accuracy of automation is expected to reduce the number of time-consuming project amendments. Kathy Harrison, DataLoch’s Programme Lead, commented: “Ally’s lead on our automation improvements will greatly reduce the effort required to deliver data to researchers and is an essential development on DataLoch’s path of long-term self-sufficiency. His leadership and experience are much valued assets to DataLoch.”
Links
DataLoch is a data service that has been developed in partnership by the University of Edinburgh and NHS Lothian. Further NHS Boards and Local Authorities in South-East Scotland will join our partnership in the future to ensure the benefits of our work extend to the entire region.
Read more about these recent Key DataLoch enhancements on the DataLoch website.