Urgent computing: presentations at SC20

28 October 2020

EPCC leads a work package of the VESTEC (Visual Exploration and Sampling Toolkit for Extreme Computing) project, which seeks to fuse HPC and real-time data for urgent decision-making for disaster response. Our work has resulted in two accepted papers into SC20 workshops: one around our custom workflow execution system, and another on extending the Common Workflow Language standard to better support parallel application execution.

A workflow management system for urgent computing

A large part of the VESTEC project is to collect real-time data about a disaster and use this to drive forecasts of the disaster's progression to help plan the best response to it. In order accomplish this, potentially complex workflows need to be executed by the VESTEC system.

After researching existing workflow management systems we realised that none of them met our requirements, particularly in the context of dealing with new data coming in during the workflow progression. To this end we developed our own custom workflow management system for VESTEC, which we describe in our paper (see link below).

Our workflow management system is built around message passing using RabbitMQ, and allows itself to be scaled up or down depending on the amount of work that needs to be carried out. Furthermore we designed it with an easy-to-use API to allow new workflows to be quickly integrated into the VESTEC system. In the paper we also outline one of our workflows in detail, show how the workflow management system ties into the rest of the VESTEC system, and demonstrate a fully integrated workflow in action.

G. P. S. Gibb et al., "A Bespoke Workflow Management System for Data-Driven Urgent HPC".

HPC for urgent decision making workshop (https://www.urgenthpc.com).

SC20 event schedulePre-print.

Extending the Common Workflow Language standard

While working on our custom workflow system, we realised that the Common Workflow Language (CWL) standard would be very useful, allowing us to wrap individual steps behind a uniform interface as well as providing a way to describe input and output. However, it could not cleanly execute MPI-parallelised applications such the local weather or wildfire simulations required by VESTEC.

In this paper (linked below) we describe the experience of using these standards within the system and how, with the help of one of the CWL maintainers (Crusoe), we extended the CWL reference implementation to support MPI applications. This will extend the benefits of CWL to a new community users of high-performance computing (as opposed to the high-throughput computing common within the existing majority-bioinformatics user community).

We also discuss some unexpected benefits around performance monitoring and dependency management.