Press Room

Summary

The term "provenance" refers to information about the origin,
context, derivation, ownership or history of some artifact. In both
art and science, provenance information is crucial for establishing
the value of a real-world artifact, guaranteeing for example that the
artifact is an original work produced by an important artist, or
that a stated scientific conclusion is reproducible. Even in everyday
situations, we unconsciously use provenance to judge the quality of an
artifact or process. For example, we often decide what food to buy
based on freshness, origin and "organic" labels; and we decide
whether or not to believe an online news article based on its source,
author, and timeliness.

Maintaining good and convincing records of provenance is difficult. It seems to require both pervasive monitoring of actions as they are performed, and a clear understanding of system boundaries and trustworthiness of actors. For example, every step in the chain of ownership of an important work of art needs to be recorded in a secure way in order to defend against forgery and deter attempts to sell stolen artwork.

Since it is much easier to copy or alter digital information than to
alter real-world artifacts, there are even more opportunities for
misinformation, forgery and error in the digital world than there are
in the traditional physical world. For this reason, the need for
provenance is now widely appreciated. Simple and unreliable forms of
automatic provenance tracking, such as version numbering, ownership,
creation and modification timestamps in file systems, have long been
supported as a basic services on which more sophisticated tools can
rely. In today's increasingly networked and decentralized world,
however, we anticipate the need for richer provenance recording and
management capabilities to be built into a wide variety of systems.

For example, "grid" or "cloud" computing infrastructures are
frequently used for scientific computing, as part of a widespread
trend towards "eScience", "cyberinfrastructure" or more recently
the data-intensive "fourth paradigm" of science popularized by Jim
Gray and others. These systems are complex and opaque. The
correctness and repeatability of scientific conclusions (about, for
example, climate change) is increasingly being questioned because of
the lack of transparency of the complex computer systems used to
derive the results. Provenance technology can help to restore
transparency and increase the robustness of eScience, countering
increasing skepticism of scientific results as evidenced by the
so-called "Climategate" controversy in 2009.

This problem is already widely appreciated in scientific settings but
is increasingly recognized as a problem in business, industrial and
Web settings. Until recently, work on provenance has mostly taken
place in relatively isolated parts of existing research communities,
such as databases, scientific workflow-based distributed computing, or
file systems, or the Semantic Web. However, we believe that to make
real progress it will be necessary to form a broader research
community focusing on provenance.

In this respect, the aims of Dagstuhl Seminar 12091 "Principles of
Provenance" were to:

bring together researchers from databases, security, scientific workflows, software engineering, programming languages, and other areas to identify the commonalities and differences of provenance in these areas;

improve the mutual understanding of these communities;

identify main areas for further foundational provenance
research.

The seminar hosted 41 participants in total from the above
communities, and included representatives from the W3C Provenance
Working group that is in the process of standardizing a common data
model for representing and exchanging provenance information.

To improve the mutual understanding of the various communities, the
first day of the seminar was devoted to tutorial talks from
well-respected members of each community.

The rest of the seminar consisted of presentations of recent ongoing
provenance research in the various communities, as well as break-out
sessions aimed at deepening discussions and identifying open
problems.

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.