IIT Database Group

Provenance using Temporal Databases

In this project we explore how provenance computation can benefit from temporal database techniques. This project is funded by and executed in collaboration with the Oracle corporation. Starting from porting rewrite-based techniques such as the ones used in Perm (Perm) to the Oracle SQL dialect, we will study how to 1) compute the provenance of past query and 2) compute the provenance for updates and transactions. This requires non-trivial extensions to current provenance techniques, because of, e.g., interaction of transactions under lower serialization level. Our solution can retroactively trace transaction provenance as long as
an audit log and time travel functionality are available (both are
supported by most DBMS). One of the major outcomes so far is the development of the concept of reenactment queries, queries that reenact the effects of a transaction. Reenactment queries are the main enabler of retroactive provenance computation for transactions.

Within this project we have made the following major contributions to provenance management

Development of MV-relations, a provenance model for queries, updates, and transactional histories that extends the seminal semiring annotation model (defined for queries) with support for updates and transactions.

Development of reenactment, a declarative replay technique with provenance capture that enables tracking the provenance of a past update or transaction retroactively by executing a query.

Implementation of provenance tracking for transactions over a standard relational database as part of the GProM system.

MV-relations - A Provenance Model for Transactional Updates

As part of this project we have developed a provenance model that allows tracking the provenance of tuples through queries and transactional updates. In our model, the complete derivation history of a tuple - which update operations derived the tuple and one which inputs of these operations does it depend on - can be encoded in the annotation of the tuple.

Reenactment - Declarative Replay with Provenance Capture

Reenactment is a declarative replay technique that enables a transactional history (or part thereof) to be repeated by executing a so-called reenactment query. We have proven that reenactment queries produce the same result and have the same provenance as the operation(s) they are replaying. Thus, a reenactment query can be used to retroactively compute the provenance of an operation executed some time in the past as long as the database state seen by this operation can be accessed.

Implementation in GProM

To retrieve the provenance of a past update (transaction, or history) we construct its reenactment query based on a log of SQL operations executed in the past (e.g., Oracle's audit log facility). Such an reenactment query needs to be executed over the database state seen by the operation(s) to be replayed. We use time travel to access such past database states. The techniques developed in this project have been integrated in the GProM system, a database independent middleware application for computing provenance.