The Lasair alert broker

17 July 2023

Preparing for a massive, complex astronomical data stream.

Wide angle image of large telescope under construction.

Exploring the transient optical sky

The Vera C Rubin Observatory, currently under construction in Chile, is designed to conduct a decade long optical survey, known as the Legacy Survey of Space and Time (LSST). EPCC has been involved with the LSST programme for over ten years.

With its unprecedented combination of depth, field of view and rapid cadence (it will revisit the same area of sky approximately every three nights), one key area of science that the Rubin Observatory will support is the study of transient and variable objects such as supernovae and active galactic nuclei.

A key output from the Rubin Observatory will be the alert stream, which aims to provide an alert for every transient, variable or moving object within 60 seconds of observation. This will allow astronomers to quickly analyse sources of interest and potentially schedule follow-up observations as necessary. Given the capability of the telescope, however, the system is expected to produce of the order of 10 million alerts per night. With such a volume of alerts, both distributing it and enabling astronomers to find the specific alerts of interest is a major challenge. This is where alert brokers come in.

Alert brokers

Rather than attempting to provide alerts directly to the global astronomy community, the Rubin Observatory will distribute the alert stream to seven community brokers which will provide astronomers with the tools to classify, query, filter and, potentially, redistribute alerts as required. 

Lasair (pronounced L-AH-s-uh-r, it means flame or flash in Scots and Irish Gaelic) has been selected as one such broker. Currently being developed by The University of Edinburgh, Queen’s University Belfast, and Oxford University, it has a particular focus on giving users powerful and flexible access to the data stream using a SQL-like query language and the ability to create their own filtered sub-streams that can be easily passed on to downstream projects. It also provides added value through intelligent cross-matching of alerts with existing catalogues, provides mechanisms for users and other services to add additional information, and enables users to share filters and watchlists with each other.

To develop the functionality that will be needed, the Lasair team has built a prototype system that is currently serving data from the public stream of the Zwicky Transient Facility (ZTF), which is releasing a transient alert stream in a format similar to that envisaged for LSST, albeit at a much smaller scale. Although the primary purpose of this system is to be a working prototype for the full LSST alert broker, it is already enabling a significant amount of science with 71 citations and a similar number of science papers making use of Lasair.

Engineering challenges

Building such a system, especially with a small development team and limited budget, presents a number of software engineering challenges. Perhaps the most obvious is the high data rate. Although the anticipated alert stream is not too large in terms of bytes, it consists of many, potentially quite complex, messages arriving at a variable rate and needing to be processed with the minimum of latency.

A key architectural decision that was made early in development was to ensure that, as far as possible, each alert message could be processed independently and that the design would allow for messages to be processed out of order or multiple times without compromising the integrity of the data. This allows us to scale out as required by adding additional processing nodes of whatever type are needed to support the required processing rate.

The basic architecture of the alert pipeline is shown in figure 1. Alerts are received from the upstream source via Apache Kafka. They then pass through a series of stages: first the alert is ingested and a full copy of the information archived in a Cassandra database; the alert then passes to the Sherlock classifier that adds contextual information to the alert by cross-matching against known sources; after this the alert is processed by the filter stage which runs the user-defined filters against the alert and takes any required actions (such as writing the alert to an output Kafka stream for onward use by other systems or sending the user an alert email). Finally the contents of the alert are used to update the relational database that maintains the record of all objects processed to date.

Figure 1

Figure 1: High level overview of the Lasair architecture.

Since all processing and filtering is performed using only the information that is present in the alert message itself, all processing within the alert pipeline up until the final stage is trivially parallelisable, permitting easy scale out as required. 

Just as important as making the back end performant, is making the front end usable. Our aim is to provide users with an extremely powerful and flexible toolset; the challenge is doing so in a way that can be used by astronomers who, while they may be somewhat familiar with SQL and, often, Python, are not primarily software developers. It must also have sufficient safeguards to prevent, for example, one badly written query from bringing the entire system to a halt.

Part of the solution here is to start with technologies that we already expect users to have some familiarity with. User filters are therefore written in a restricted version of SQL and to access the API we provide a Python client library. We then provide extensive documentation, including a set of example Jupyter notebooks that demonstrate use of the API.

Building for the long term

The Legacy Survey of Space and Time will run for 10 years and Lasair will not only need to remain operational for the duration, but to cope with a continually expanding archive of alerts. It will need to migrate to new hardware and software platforms and remain secure, all while maintaining the scientific integrity of the data, all with a small devops team and limited resources. To address these challenges we have implemented a number of measures, both technical and non-technical.

On the software engineering side we have, with assistance from the SSI, introduced unit testing and code reviews and are moving towards test driven development as well as introducing a CI system. We have built an automated deployment system using Ansible and Terraform so that we can quickly spin up new instances of Lasair, which is invaluable for development and testing, including profiling and performance evaluation. Finally we have incorporated a monitoring system based around Prometheus and Grafana so that we can see at a glance how the system is running and quickly address any problems.

In terms of resourcing, Lasair is designed to run on the Scientific OpenStack platform as provided by the IRIS digital research infrastructure. It is therefore able to be hosted on any of a number of UK academic clouds or, with minor modifications, most OpenStack cloud platforms.

Diagram 2

Figure 2: Screenshot from the Lasair object page for AT2021lw/ZTF20abrbeie

Helping to find the “largest cosmic explosion ever witnessed”

Lasair is already providing input to downstream services, one example of which is ePESSTO+. This spectroscopic survey uses input from Lasair (amongst other sources) to select targets for observation. 

In 2021 it observed object AT2021lwx after ingesting an alert from Lasair. This spectrum was considered inconclusive at the time and was therefore archived. In 2022, a team led by Philip Wiseman at the University of Southampton noticed the object again in a Lasair query for transients with no known host galaxy. Re-examination of the spectrum showed absorption lines indicating a red shift of z=1, implying an absolute magnitude around -26, making it one of most luminous transients observed. See article Multiwavelength observations of the extraordinary accretion event AT2021lwx.

The precise cause of this extremely luminous transient remains unknown. The above study favours the accretion of a giant gas cloud by a SMBH. Meanwhile an independent study by a US team identified the same transient using ANTARES, another alert broker. Their study favours a scenario involving the tidal disruption of a large star of around 15 solar masses.

When the LSST becomes active, it is expected that many more such unusual transients will be found, and Lasair is the ideal platform to help astronomers search for and study them.

Image at top of page shows the telescope mount inside the dome. Image credit: H. Stockebrand/RubinObs/NSF/AURA.

Links

LSST:UK

You can read about related EPCC collaborations in these articles on our website:

Parallelising Macauff photometric catalogue software for the LSST:UK cross-match service.

Accelerating the simulation of galactic images with OpenMP Target Offload

Author

Gareth Francis
Gareth Francis