Preparing for an unprecedented astronomical data set

19 May 2022

The Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) is the most ambitious optical sky survey yet planned. In construction in northern Chile, this US-led project is expected to make a step-change in our understanding of dark energy, the evolution of planetary systems and their capacity to sustain life, and the constituents of matter.

Over the course of a decade, beginning in 2024, the Rubin Observatory will build up a deep, multicolour map of the southern sky, covering 18,000 square degrees. The survey will image each part of the sky more than 800 times over its 10-year course, revealing significant new details of the Universe’s dynamics and identifying an unprecedented number of transients, such as supernovae stars, asteroids in the Keiper Belt, and gravitational-wave triggers.

Recognising the significant opportunity of LSST, in 2013 UK astronomy groups formed the LSST:UK Consortium to coordinate UK exploitation of LSST and to develop and secure funding for a substantial 20-year programme of research and development activities. The Consortium has a portfolio of work addressing the processing and curation of survey data products, serving those data products to the community, as well as downstream exploitation for key UK interests in astronomy and cosmology.

LSST will deliver a 500-petabyte set of images and data products that will address some of the most pressing questions about the structure and evolution of the Universe and the objects in it. Software is one of the most challenging aspects of Rubin Observatory, as more than 20 terabytes of data must be processed and stored each night.

The size of the survey makes it impractical for astronomers to download survey data. Therefore they will run their analysis at one of the Data Access Centres (DACs). The University of Edinburgh (in a collaboration between EPCC and the Institute for Astronomy) is a core partner in LSST:UK and will operate one of only three full-scale DACs, holding the complete survey (estimated at 200 Petabytes) and providing cloud-based and HPC analysis platforms to enable world-class astronomy and cosmology.

Survey data consists of processed and combined images and catalogues (databases of scientific measurements based on detected objects eg galaxies and stars). LSST is expected to identify and classify around 35 billion objects. The size of these catalogues (up to 20 Petabytes) has prompted the observatory to develop a bespoke database platform called Qserv, a distributed, relational database comprising several hundred SQL databases tied together using tailored workflows that distribute queries across a partitioned view of the sky and reconstruct a result set from the outputs.

The applications for LSST are wide and varied. To accommodate this variety, a science portal called the Rubin Science Platform is being developed by the observatory, with three different interfaces: a browser-based query engine called Firefly to address common, interactive astronomer queries; a notebook interface called Nublado, which allows astronomers to develop and part-automate more complex analysis workflows using a scripting language such as Python; and a batch-processing interface, which will allow large collaborations to employ high-performance computing to conduct large-scale batch-processing of survey data.

Alongside the main survey, the Rubin Observatory will monitor night-to-night changes in the sky, searching for rare and scientifically important events such as supernovae explosions, approaching near-Earth objects, and gravitational-wave counterparts. Detected events, expected to number around 10 million per night, will be sent to a small number of institutions around the world for further processing. A joint team in Edinburgh and Queen’s University Belfast is developing a platform called Lasair to receive this event stream, and it will be one of seven official Rubin brokers world-wide which will receive the stream. Lasair is already running, hosted at EPCC’s Advanced Computing Facility (ACF) and processing an alert stream from the Zwicky Transient Facility (a precursor to LSST) which is employed by international research groups.

Alongside the implementation and hosting of a UK Data Access Centre, EPCC is providing overall project management and is also involved in R&D activities focused on UK science priorities. For example as part of the Dark Energy Science Collaboration, we are porting and optimising an important telescope simulation code called GalSim to work with GPUs, to allow it to be run on new HPC services that are coming online in the US and UK. Work to date has more than halved the runtime for LSST-scale simulations and enabled it to run efficiently on many-core systems such as the AMD Epyc processor in ARCHER2.

EPCC staff are also working with a team of astronomers at Exeter University to optimise a program called Macauff that matches LSST objects to equivalent objects from other surveys – for example to allow an astronomer to look up measurements for a galaxy observed by LSST from other surveys. The sheer number of LSST objects (roughly 35 billion) and the density of objects – especially in the galactic plane – make this a computationally challenging task. EPCC staff are creating a parallel implementation of Macauff which could run on a UK Tier-2 HPC service such as Cirrus and be capable of computing a full crossmatch of LSST to another survey in under two weeks.

The LSST:UK programme is a long-term one, due to continue until at least 2035. Observations from the operational telescope are due to arrive from late 2024. Preparations for this are in full swing, with an OpenStack-based DAC platform called Somerville sited at the ACF and hosting pre-cursor surveys and early observations from LSST commissioning activities. These also have scientific value, meaning UK-based astronomers are already beginning to reap the rewards of the Rubin Observatory as well as learning the data-intensive research skills and techniques they will need to up-scale to LSST.

Image: Inside the observatory, progress continues on the telescope mount assembly. Credit: Rubin Obs/NSF/AURA.

Inside the observatory, progress continues on the telescope mount assembly.  Image: Rubin Obs/NSF/AURA.

Author

Dr George Beckett
George Beckett