Provenance Tool Suite: Tracking data to its origins

24 October 2016

This article is part of the Software Sustainability Institute's series Breaking Software Barriers, which investigates how its Research Software Group has helped projects improve their research software. 

From concept to software

Provenance traditionally is the record of ownership of a work of art or an antique, used as a guide to authenticity or quality. Although mostly used to track the origins of a work of art, the term is now used in an array of fields ranging from palaeontology to science. It refers to having knowledge of all the steps involved in producing a scientific result, such as a figure, from experiment design through acquisition of raw data, and all the subsequent steps of data selection, analysis and visualisation. Such information is necessary for reproduction of a given result, and can serve to establish precedence. This concept also applies to the digital world; that is, data also originates from a particular point, and provenance provides evidence of its point of origin or discovery by establishing its ownership, custody, and transformations.

Trung Dong Huynh, from the Electronics and Computer Science department at the University of Southampton and part of the Provenance Tool Suite team, commented on one of the most fundamental outcomes of his collaboration with the Software Sustainability Institute:

We needed help to manage the bug reports we get from Provenance Tool Suite more effectively. Mike [Jackson] was able to improve the roundtrip interoperability across different libraries, which had a key impact on the way we manage bug reports. We used to manage them one by one, but, thanks to the Institute and Mike, this is now an automated process.

Luc Moreau and Daniel Michaelides, also from the University of Southampton, and Trung Dong Huynh developed the Provenance Tool Suite—a suite of software, libraries and services to capture, store and visualise provenance. The software is compliant with the World Wide Web Consortium (W3C) PROV standards, which define how provenance information can be represented and how it can be exchanged.

The PROV standards allow linking data back to evidence of when it first originated following appropriate processes to evaluate the trustworthiness of such data. There are several organisations already using them, such as NASA and the UK National Archive. In particular, Provenance Tool Suite lets users around the world check the consistency of their data and expose where it’s coming from, while also making it accessible to the public.

Interoperability and better documentation

The goal of this collaboration was to develop an infrastructure which systematically checks convertibility and round-trip conversions across combinations of Provenance Tool Suite packages and services operating collectively. Mike Jackson, EPCC Software Architect and Software Sustainability Institute Research Software Engineer, went through all the Provenance Tool Suite libraries and documentation and provided the Southampton team with concrete advice on how to make their software open source and improve its documentation. Dong stated that no one in his team is a professional software developer, but rather they “happen to develop software” for research:

Getting help from the Institute has definitely saved us a significant amount of time and efforts by enabling us to identify issues early, allowing us to focus on development work that matters, while still providing us with the confidence in the quality of our products.

Dong successfully applied to the Software Sustainability Research Software Group for help as part of our Open Call. According to Mike Jackson, the Institute work included:

Testing of: round-trip interoperability between ProvPy and ProvToolbox; between these packages and ProvStore, ProvTranslator and ProvValidator services whether these be deployed locally, on a developer's own machine, or remotely; ProvJS-related operations; and, command-line utilities that are provided within ProvToolbox.

Dong also reports that more users are engaging with the tool and interacting with it, as well as sending bug reports—which the team monitors and fixes more efficiently—improving thus the communication between Provenance Tool Suite users and developers.

If you would like help with your software, contact the Research Software Group.

For free help to assess or improve your software, submit an application to the Institute's Open Call.