prov

Post navigation

I recently wrote a blog post about the relaunch of openprovenance.org. Today, I am pleased to announce the availability of two websites providing a historical perspective on the work that took place in the provenance community.

The Provenance Challenge website is hosted at https://openprovenance.org/provenance-challenge/WebHome.html. It is kept in its original wiki look-and-feel as it constituted a significant community effort that led to the PROV standardisation. At the time, the community decided that it needed to understand the different representations of provenance, their common aspects, and the reasons for their difference. The Provenance Challenge was born as a community activity aiming to understand and compare the various provenance solutions. Three consecutive provenance challenges took place. A significant artifact that resulted from the Provenance Challenge series is the Open Provenance Model.

The Open Provenance Model (OPM) website is hosted at https://openprovenance.org/opm/. OPM is the first community data model for provenance. OPM was designed as a conceptual data model for exchanging provenance information. It contained key concepts such as Artifact (called Entity in PROV), Process (called Activity in PROV), and Agent. It also introduced notions of usage, generation and derivation of artifacts.

Of course, all this is now superseded by PROV, the W3C set of Recommendations and Notes for provenance. These legacy sites are made available to the community for reference. We aim to persist those pages and URLs in the future. Feel free to link to them!

It is my pleasure to announce the relaunch of openprovenance.org, the site for standard-based provenance solutions.

With our move to King’s College London, Dong and I have migrated the provenance services from Southampton to King’s. I am pleased to announce the launch of the following services at openprovenance.org:

ProvStore, the provenance repository that enables users to store, share, browse and manage provenance documents. ProvStore is available from https://openprovenance.org/store/.

A validator service that checks provenance documents against the constraints defined in prov-constraints. Such a service can detect logically inconsistent provenance. An example of such inconsistency is when an activity is said to have started after it ended, or when something is being used before it was even created. The validator is hosted at https://openprovenance.org/services/view/validator.

A template expansion service facilitates a declarative approach to provenance generation, in which the shape of provenance can be defined by a provenance document containing variables, acting as placeholders for values. When provided with a set of bindings associating variables to values, the template expansion service generates a provenance document. The template expansion service lives at https://openprovenance.org/services/view/expander.

The Southampton services will be decommissioned shortly. If you have data in the old provenance store, we provide a procedure for you to download your provenance documents from the old store, and to upload them at openprovenance.org. In the age of GDPR, you will have to sign up for the new provenance store and accept its terms and conditions.

While the look and feel of the services may look quite similar, under the bonnet, there have been significant changes.

We have adopted a micro-service architecture for our services, allowing them to be composed in interesting ways. Services are deployed in Docker containers, facilitating their redeployment and enabling their configurability. We are also investigating other forms of licensing that would allow the services to be deployed elsewhere, allowing the host to have full control over access, storage and management. (Contact us if this is of interest to you.)

We have adopted Keycloak for identity management and access control for our existing and future micro-services. This offers an off-the-shelf solution for managing identities and obtaining consent. A single registration for all our services will now be possible.

I am regularly asked by students and researchers about a reading list on provenance. The following papers give them a good baseline about the kind of work we undertake in my group. This is not meant to be an extensive literature survey, but this should give them enough background to have discussions about projects related to provenance.

Today, I released ProvToolbox 0.7.3. The principal changes in this new version of ProvToolbox are concerned with prov-template, the templating system for provenance. The new release also contains few minor bug fixes and changes.

1. Template System

A reminder: a PROV-template is a PROV document, in which some variables are placeholders for values. A PROV-template is a declarative specification of the provenance intended to be generated by an application. A set of bindings contains associations between variables and values. The PROV-template expansion algorithm, when provided with a template and a set of bindings, generates a provenance document, in which all variables have been replaced by values.

PROV-template is a new approach to creating a provenance-enabled application. Templates are designed and embedded in the application’s code, the application logs values (in the form of bindings), and provenance is automatically generated by template expansion.

In ProvToolbox 0.7.3, we have adopted a more compact and user-friendly representation for sets of bindings. Instead of representing them as PROV, we can now represent them as JSON. At the same time, we also handle variables in a more uniform manner, allowing variables occurring in mandatory position, to be also used in attribution position. I won’t go into the technical details, but these two changes make the design of templates and the construction of bindings much simpler!

A further change is that we have implemented a simple “bindings bean” compiler: it takes a template definition and creates a java class, which allows sets of bindings to be created directly from Java, and serialized easily. The aim of this compiler is to simplify the implementation of applications generating provenance.

The GitHub source code repository contains code for two further tutorials (Tutorial5 and Tutorial6). I will write up the text for these tutorials in the New Year.

2. Qualified Pattern for All PROV Relations

At the recent PROV: Three Years Later Workshop, I made the case for the Qualified Pattern to be used for all PROV relations. My key motivation for this extension to PROV is my provenance summarisation algorithm, which generates a “summary provenance graph“, in which nodes and edges are annotated with weights indicating how frequently these kinds of nodes and edges can be found in the original graph. To allow for such annotations to be added to specialization, alternate, and membership relations, they need to support the Qualified Pattern.

At this stage, it is the data model that is modified. Serialization to xml and provn is work in progress, and not supported in prov-json and prov-sql yet. Furthermore, there is no parsing yet. Three new interfaces have been defined in the package org.openprovenance.prov.model.extension.

3. Release Log

4. Conclusion

We keep on using ProvToolbox in various applications to generate provenance with templates and to undertake some analytics using the summarisation algorithm. This new release was critical to support these two use cases of ProvToolbox. Shortly, I will release two further blogs with new tutorials for prov-template.

Click on the Installer. Note that you need to allow installation of programs from any sources in your security preferences. Then simply follow the instructions. The installer will install all libraries and executable in /Applications/provconvert (default location, which can be overriden), as well as a symbolic link making the provconvert executable available in your execution path. An Uninstaller is also available as an executable jar file /Applications/provconvert/Uninstaller/uninstaller.jar.

provconvert Installer

Et voila! The executable can be invoked directly from the command line.

provconvert -version

which should return provconvert version 0.7.2 (2015-09-15 20:16).

2.2. Templates

As we continue to use templates in our applications, two further requirements have been implemented. It is now possible to expand a template, and strip the result from any variable that has not been instantiated. For this, simply pass the option -allexpand to provconvert, to be used in conjunction with the -bindings option (see Tutorial 4 (part 1) and Tutorial 4 (part 2) on template processing in ProvToolbox). Furthermore, an error code is returned when not all variables have been expanded.

1. Introduction

Yesterday, I released ProvToolbox 0.7.1. It is a minor release, fixing minor bugs of 0.7.0, and including a useful new feature.

2. Novel Features

2.1. Debian Package

To facilitate installation, a new binary release format is now supported: Debian packaging to support binary release on Ubuntu and other Debian-based Linux distributions. You just need to run the following commands.

2.3 Visualization

Modification of the visualisation component prov-dot allow dge thickness, node size, and tooltips (on SVG) to be controlled. For this, the provenance graph nodes and edges need to be annotated with reserved attributes dot:size and dot:tooltip. The following figure illustrates the kind of graphs that can now be generated.

A summarisation of the provenance challenge workflow. Nodes are to be understood as provenance types. Thickness of edges and size of nodes reflect their frequency in the summarised document.

2.3 Bug fixes

I also fixed some minor bugs in qualified namespaces in the prov-sql package, and updated reserved namespace for provtoolbox.

1. Introduction

In several of our applications, we felt the need of separating the logging of information from the constructing and storing of provenance. For this, we introduced PROV-Template a templating system for provenance, describing the shape of provenance graphs to be generated, and we specified an algorithm capable of instantiating templates, with specific values.

The purpose of this tutorial is to introduce PROV-Template and how templates can be instantiated using ProvToolbox. This functionality is directly available from the command line using provconvert.

The tutorial assumes that provconvert has been installed and is available in the execution path. (See http://lucmoreau.github.io/ProvToolbox/ for installation instructions.) The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Example of Templates

2.1 A Template for Attribution of a Quote

Building on blog post “A little provenance goes a long way”, imagine that we need to systematically provide attribution to quotes. As this is a repetitive tasks, we should consider the PROV-Templates approach to generate provenance.

A provenance template is itself a PROV document in which some variables act as placeholders for values to be filled at expansion time. More precisely, a template is a bundle of PROV assertions: a bundle is the PROV mechanism by which provenance of provenance can be expressed.

The figure below contains a graphical illustration of a template for Quote Attribution. It contains the following variables:

var:author the identifier of the author (stated to be a prov:Person)

var:name the author’s name

var:quote the identifier of the quote

var:value the quote itself

vargen:bundleId the identifier of the bundle to be generated

The quote is attributed to the author agent. The variables var:author, var:namer, var:quote, var:value are qualified names in a namespace reserved for PROV-Template variables, and are conventionally prefixed with the prefix var. There is an expectation that values need to be provided for these variables when instantiating a template. On the other hand, the variable vargen:bundleId, with prefix vargen, can have a value generated automatically at instantiation time.

Quote Attribution Template

Concretely, in the PROV-N notation, the template is expressed as follows.

2.2 Template Instantiation: A Little Provenance Goes a Long Way

Let’s now look into how we can instantiate the templates. Let us consider the following bindings for the 4 variables author, name, quote and value. An association between a variable and a value is referred to as a binding.

If we instantiate the template with these bindings, we obtain the following instantiated document. We note that vargen:bundleId was instantiated with UUID value.

Template Instantiation for “A Little Provenance Goes a Long Way”

Expansion of a template with provconvert is straightforward. The parameter -infile must be used to provide the template. The binding file is specified with the -binding parameter. The resulting instantiated template is specified with -outfile.

The input template and its instantiation can be expressed in any of the formats supported by ProvToolbox. We still have to express the set of bindings. We did not want to introduce a new specific format (though we may do it in the future), so, we just decided to use PROV. In particular, the Turtle notation is fairly elegant in this case. Two family of properties are introduced in the tmpl namespace, namely value_i and 2dvalue_i_j, for binding variables in identifier and value positions, respectively.

2.3 Template Instantiation: A Second Author

In some cases, we would like to express that there is a second author to a document. The attribution template does not need to be redefined. We simply need to provide relevant bindings for the second author.

For instance, Paul and Luc are the two authors of that quote. Conceptually, we want to provide the following bindings.

We see that each of var:author and var:name is given two values. This results in the following expanded provenance graph.

Template Instantiation with Two Authors

The contents of the bindings file is explicit below. Lines 7-9, var:author is given two values, using the properties tmpl:value_0 and tmpl:value_1. Lines 10-12, var:name is given two values to occur in attribute position, with properties tmpl:2dvalue_0_0 and tmpl:2dvalue_1_0.

2.4 Template Instantiation: More Attributes

In general, PROV also allows for variable number of attribute values to be provided for a given attribute. For instance, we may want the name and nick name to be provided as two possible values for the var:name variable. This would result in the following expanded graph.

Template Instantiation with Variable Number of Attributes

Again, the template remains unchanged, but the bindings are as follows. In lines 12-13, we see two possible names for Paul, respectively expressed with tmpl:2dvalue_1_0 and tmpl:2dvalue_1_1. This shows that template expansion can support a variable number of attributes for different statements instantiated from the same template statement.