IIT Database Group

Provenance Games and Generalization

Explaining why a certain answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data, and answering hypothetical questions about data. Both types of questions, i.e., why provenance and why-not (missing answer) provenance have been studied extensively independently. Based on a game theoretic notion of provenance developed by our collaborators at UIUC, we investigate how to developed a unified approach towards provenance and missing answers. Our approach is based on the observation that for a provenance model for queries with full negation, why-not questions can be translated into why questions and vice versa. In addition to investigating how provenance games can be computed efficiently using a goal oriented approach that outsources most of the computation to a datalog engine or relational database, we also are interested in the theoretical properties of provenance games and their relationship to existing provenance and missing answer models. Furthermore, provenance for missing answers (and queries with negation in general) can be very large or even infinite. Thus, a pratical approach requires the size of provenance to be reduced through finite encoding of infinite structures, compression, generalization, and summarization.

Implementation

We envision an system which supports computes explanations of a user's why and why-not questions as (summarized) provenance game fragments. The computation of the relevant fragment of a provenance game will be offloaded to a datalog engine or relational database backend. Here we will benefit from the optimized relational algebra to SQL compilation of our GProM system. The figure below gives an overview of the architecture of the proposed approach.