Administrative Data Research (ADR)
Posted: 30 Jul 2021 | 10:01
ADR Scotland (Administrative Data Research Scotland) is a partnership combining specialists in the Scottish Government’s Data Sharing and Linkage Unit with the expertise of academic researchers at the Scottish Centre for Administrative Data Research. Together they are transforming how public sector data in Scotland are curated, accessed and explored, so it can deliver its full potential for policymakers and for the public.
EPCC is working with National Records of Scotland and Public Health Scotland to deliver the linkage service for the partnership. The ability to link records that relate to the same individuals adds a new dimension to data analysis, allowing researchers to carry out studies across datasets relating to different aspects of public services (for example combining health and education data). The richness of the results of such research can have a huge positive impact on the wellbeing of Scotland’s population. There are, however, stringent legal and ethical constraints that must be met to ensure that the privacy of individuals is preserved and that the research carried out using linked datasets does not lead to disclosure of any individual’s personal data.
The partners have devised a linkage mechanism based on established best practice and cyber-security to protect personal data. The first ‘research ready’ datasets have been imported into the ADR infrastructure from the Scottish Government’s Education Analytic Services division, following the conclusion of the required agreements to allow for legal processing of the data. Defining and putting into force the information governance has been a lengthy process, but essential to ensure that all risks are understood and a sound legal basis is in place.
ADR works on the principle of storing research-ready data sets which can be linked to others in ADR or from other sources to create a dataset for an approved study. Linkage is enabled by the use of keys that are generated for each record by the Data Controller at the time of ingest. The table is split by the Data Controller into the data that will be stored in the Safe Haven and the data for indexing. The record keys are retained in both these derived tables. The tables are then sent to National Records of Scotland (NRS) and the Safe Haven as appropriate using secure file transfer. NRS carry out indexing using the table it receives, the result of which is a mapping of the keys to the spine. EPCC stores the table it receives in a part of the Safe Haven dedicated to ADR data storage.
When an approved linkage request comes in, the operators of the Safe Haven extract the keys for all the relevant datasets that correspond to the cohort to be generated according to the study specification. These sets of keys are passed to NRS, which creates a table (the Master Index File (MIF)) that contains the mapping between the records in different datasets that correspond to the same individual. The MIF is passed to the Safe Haven which converts the keys in each extract so that the corresponding records can be joined by the researcher when they receive the data.
The important points to note about the process are:
- Separation of data is maintained. The Safe Haven contains the ‘payload’ data but does not have information about identity.
- NRS has the information about identity, and hence the ability to determine which records across different data sources correspond to an individual, but it does not have any of the payload data.
- All linkage requests are subject to approvals
- Linked data is subject to a disclosure risk assessment before being passed to the researcher
The next stage of ADR will be to demonstrate the ADR process with a real study example which we intend to complete by the end of the summer.