Data management: hosting an invaluable research dataset
16 October 2025
Following many years of collaborating with the National Collection of Aerial Photography, EPCC has been chosen as the hosting site for the research copy of its historic dataset of over 30 million aerial photographs.

The National Collection of Aerial Photography (NCAP), part of Historic Environment Scotland, is the custodian of one of the world’s most notable datasets. It holds over 30 million aerial photographs of locations throughout the world and dates from the 1920s to the early 21st Century. These photographs have many provenances: most are declassified military reconnaissance sorties, whilst others are from surveys undertaken by now-defunct private companies.
Most of these images are stored as physical copies – either as photographic prints or on large-format film. NCAP is in the process of digitising each image to ensure they will remain available to future generations. When NCAP started this work, each print and each frame of film was scanned by hand by a technician. Since NCAP has evolved to automate as much of the digitisation process as possible, it has moved from being able to scan 100-200 images per day to scanning many thousands of images per day, reducing the expected time to digitise from centuries to much more manageable decades.
NCAP’s success with automating digitisation has led to an unforeseen problem: NCAP is producing more data than can be easily stored. Each scanned image is about 0.3-1.2 GB in size (roughly the same as a standard-definition movie). Once all 30 million images are digitised, NCAP will require in the region of 15 petabytes (15,000 terabytes) of storage! NCAP already has mechanisms in place for storing this data for posterity but, realising the value of its datasets to the research community, wants a copy of the data to be accessible to researchers.
EIDF: flexible data storage
This is where EPCC comes in. As hosts of the Edinburgh International Data Facility (EIDF), we can provide NCAP with the petabytes of data storage it requires. EIDF is a world-class private cloud infrastructure that has been developed to facilitate data science research. Users can access tens of petabytes of storage in a variety of flavours – whether you’re looking for locally-hosted S3 buckets securely hosted within the UK or backed-up resilient storage, EIDF can provide you with the resources you require. Additionally, EIDF is designed to ensure you can access both the traditional and AI compute necessary to extract valuable insights from rich datasets like NCAP’s.
Future research opportunities
NCAP and EPCC have together developed a framework to enable researchers to make use of the NCAP dataset for research projects. Academics can leverage access to these datasets when proposing projects, thereby ensuring the NCAP collections are actively used to provide information of societal value.
Some of the NCAP datasets, such as the “Joint Air Reconnaissance Intelligence Centre” (JARIC) dataset, contain aerial photographic images from across the globe that were taken in a regular, periodic pattern over 50 years from the 1940s. Each sortie was shot so that all of its photographs could be used to create a mosaic, showing a single image of large swathes of land.
The JARIC dataset, for instance, contains regular surveys of large parts of Africa, East Asia, and the islands of the Caribbean Sea, Atlantic and Pacific Oceans. By itself, this can be used to provide a significantly higher resolution version of Google Maps for much of the world dating back to the 1940s (before satellites were flying, let alone collecting Earth Observation data), and with the photography of locations being regularly updated.
It is difficult to overstate the importance of the research that this dataset could underpin. Having access to regular high-resolution images of the Pacific Ocean, for instance, could help identify and understand early indicators of climate change, or track its effects on coastlines from long before satellites could be used. Africa and East Asia have changed drastically from the 1940s to the early 2000s – with this dataset, we can track how land use has evolved over this critical time.
NCAP and EPCC are already exploring the best ways to use these datasets, although most of our current efforts are focused on enhancing this data and making it as easy to use as possible. We are developing mechanisms to automate the mosaicking of photographs from a sortie, and are exploring machine learning approaches for determining changes in land use over time.
Find out more
Both NCAP and EPCC are keen to grow collaborations that make use of this treasure trove of data. If you have project ideas for how this dataset could be used, please get in touch at Commercial@epcc.ed.ac.uk.
Image above shows a mosaic image produced from photographs taken during a single flight. Each mosaic creates a single image of large swathes of land, providing remarkable high-resolution aerial views, dating back to the 1940s, of much of the world. Image: NCAP.