Post navigation

With some students and collaborators, we have published three papers at IPAW’18 (International Provenance and Annotation Workshop), which are being presented this week, during Provenance Week’18, hosted at King’s College London.

1. Provenance capture with UML2PROV

It still remains challenging to develop provenance-enabled applications. A key difficulty is to ensure that the provenance generated by an application describes what the application actually does! Prov-Templates already offers a declarative way of specifying the shape of provenance to be generated, while bindings created at runtime by the application instantiate these templates into concrete provenance. UML2PROV, designed with Carlos Sáenz Adán and Beatriz Pérez Valle, goes a step further.

The ambitious vision of UML2PROV is to control the generation of provenance from UML specifications of a program, minimising the amount of manual intervention by the programmer. Three steps are involved:

From UML specifications, UML2PROV creates Prov-Templates and a runtime library to generate bindings.

A programmer’s intervention deploys the runtime library with the application.

As the application runs, the runtime library creates bindings that can be used to derive provenance from the templates.

Carlos previously supported UML sequence and activity diagrams. The IPAW’18 paper extends this work by supporting class diagrams. This apparent simple extension required a significant re-design as it allows flow of data to be better tracked in applications. The system was also fully redesigned to minimise the programmer’s intervention. The original approach used reflection and proxies to intercept method calls, whereas now it relies on aspects. Carlos is not the first to suggest that provenance aspects can be woven into code in order to generate provenance, but he is the first to do it, systematically, in the context of a software engineering methodology such as UML.

2. Deriving Contamination Risk from Food Provenance

Food has always been a powerful illustration of why provenance matters. How many times haven’t I been delivering talks and mentioning “Provenance Wine” and “Provenance Whisky” over the years? However, the adage “provenance helps users place their trust in data or things” has proven challenging to demonstrate scientifically and rigorously. I believe that the work by Belfrit Batlajery, Mark Weal, Adriane Chapman and myself is a significant step in that direction.

Belfrit’s approach is to use provenance-based descriptions of food supply chains, ideally from Farm to Fork, and partial knowledge about contamination levels of bacteria there in. A belief propagation approach over factor graphs derived from provenance graph allows him to infer levels of contamination across the network. The point of this analysis is not so much to infer super-precise contamination levels , but instead, to have an estimation that is accurate enough to be able to determine where risk is higher.

This approach would allow an organisation to meet its due diligence requirements by logging food provenance, inferring contamination levels, and use these to identify the places where food sampling is likely to reduce the level of risk across the supply chain, ultimately allowing consumer to have confidence in food products.

Down to earth, Michael Johnson, in collaboration with Poshak Gandhi, Adriane Chapman and myself, has been developing data science techniques to help with the discovery of such LMXBs. The opportunity being exploited is that arising from the Large Synoptic Survey Telescope (LSST) still being in design. Michael has developed a provenance-enabled image processing pipeline. From a provenance perspective, the challenge was to demonstrate benefits of using provenance to the astronomer. Michael was able to show that while provenance capture introduced a runtime overhead, the overhead was offset by significant compute time savings by querying provenance rather than recomputing results. This is an important outcome since it demonstrates that provenance should be routinely captured by data science toolkits because it brings benefits to its users.