Introduction

14 Feb 2013
Note: this page was last edited 20 April 2012 and was only used by the
WG while the PROV-DM and PROV-O were in flux.
Héctor Pérez-Urbina submitted to public-prov-comments@w3.org some notes similar to this page's material.

This document gives a draft translation from PROV-DM to PROV-O, and sketches how to go in the reverse direction (i.e. how to extract PROV-DM from a RDF graph that includes PROV-O data as well as possibly other RDF).

Guideline: We include all RDF assertions associated with a DM assertion, even if some of them wind up being redundant/inferrable.

Guideline: Optional arguments (including attribute lists) are in square brackets; if the argument is missing, we generally omit the corresponding RDF edges. (In some cases, not currently documented, an optional argument to a record corresponds to an unknown value that should be generated as a blank node; this remains under discussion.)

From PROV-DM to PROV-O

We define a translation from PROV-DM formulas to RDF conforming to PROV-O as follows.

ProvRDF

LHS

The undersigned have reviewed DM WD3 and agree that all ASN signatures in WD3 appear as left hand sides of the rules shown on this page. Further, the rules here are in the same order as DM WD3 and no rules appear here without appearing in DM WD3.

Daniel Garijo (10-Feb-2012)

Tim Lebo (collections still missing) (13-Feb-2012)

James Cheney (basic stuff is here, collections/accounts is not) (16-Feb-2012)

The constructs are listed in an order that corresponds to the order given in PROV-DM WD3.

ProvenanceOntology.owl

Components

The following values are used for prov:component annotations in the OWL file:

(1) entities-activities

(2) agents-responsibility

(3) derivations

(4) alternate

(5) collections

(6) annotations

Mapping goals

1: Maintain Entities and Activities principal subjects

Subjects of triples are more principal than objects of triples. Because Entities and Activities are the two principal topics of PROV-DM, the RDF mapping should prefer that Entities and Activities be subjects of as many triples as possible. In the case when the object instance is ALSO an Entity or Activity, the directionality of the triple should point to the Element that "existed earlier". For example, :activity prov:used :entity is preferred over :entity prov:usedBy :activity because the entity existed before it was used by the activity.

2: Avoid proliferating owl:inverseOf

Although every property could have an inverse, we choose one preferred direction to keep the model small and understandable. Providing all inverses could be done in a supplemental profile. One exception to this rule is prov:wasGeneratedBy's inverse: prov:generated, which is included because of goal 1. When an asserter is describing an Activity (a principal Element), they should be able to describe it as a subject. prov:generated is needed to do this.

3. Include _all_ resulting triples, including those resulting from inferences

4. Naming style for prov:entity prov:activity prov:agent is RESERVED

The naming style of prov:entity prov:activity and prov:agent is adopted from that of rdf:object. NO OTHER prov predicate may adopt this same style, so that the style clearly indicates _which_ predicate is referencing the object of the unqualified relation that the Involvement is qualifying.

So, predicates named quoter, quoted, generation, usage are not permitted. If we _are_ going to reference these things from an Involvement, the hadXXX pattern should be followed.

restate: Properties on Involvements which are a noun and match the desired range (e.g. entity, activity, agent) are reserved for the reification properties of an involvement?

Visual style

Tokens in a gray background have a scope local to the assertion (e.g. ":id").

Tokens in a light brown background exist in the provenance namespace (e.g. "prov:wasDerivedFrom").

Triples with gray text can be inferred with RDFS reasoning, e.g. superclasses/superproperties, however typing from rdfs:domain / rdfs:range are shown in black.

PROV-N

sd:name

Subject

Predicate

Object

asnExpression(id,e,a,t,[attr_1=val_1, ...])

==>

:a

a

prov:

:e

a

prov:

:e

a

prov:

:e

prov:

:id

:id

attr_1

val_1

...

:id

attr_n

val_n

Partial mappings

The right hand side of the rules shown on this page are intentionally verbose. They are included to see the full ramifications of a DM ASN expression.

Note that if an argument is not provided in the ASN, the corresponding triples that require that value are NOT produced (unless they are required to link to other produced triples). This means that simple ASN assertions produce simple PROV-O assertions.

For instance, if in PROV-N we have simply wasGeneratedBy(e,a) rather than the full wasGeneratedBy(e,a,t,[attr_1=val_n, ..., attr_n=val_n]), then:

PROV-N

sd:name

Subject

Predicate

Object

wasGeneratedBy(e,a)

==>

:a

a

prov:Activity

:e

a

prov:Entity

:e

prov:wasGeneratedBy

:a

However if time was also given, we will need to expand into an involvement:

a "hadActivity" from the Involvement to the Activity is intentionally omitted. It's purpose is performed by its inverse "hadQualifiedUsage", which points from an Activity to an Involvement. This is done to maintain the design goal of "Entities and Activities are principle instances" and that the subjects of triples are more principal than objects of triples. -Tim

This RDF expansion is very verbose because of the infererred
usage/generation links with the activity. The actual derivation is fully
asserted using :e2 prov:qualifiedDerivation :id and :id with its
direct properties.

TODO: Use different colour/font for inferred statements? For instance..
italics? Or can we keep such inference rules separately to avoid
repeating them, including subclass hieararchy? I in many way prefer to
show all superproperties and superclasses, because it would highlight
cases where they might not make sense or is difficult. For instance
above - is prov:Derivation always a prov:ActivityInvolvement?
--Stian

Issue:

Show what non-activity specific wasDerivedFrom(id, e2, e1,

[attr1=val1]) will look like as well? Introducing usage or generation
will infer a single activity, but it's still possible to do derivation
across multiple activities.

Further terms in records

It's not clear to me that we need to spell these out as rules in the mapping. But, it is good to explain how attributes, literals, identifiers, times, etc. in PROV-DM map to PROV-O-compliant RDF. ---[James]

Identifier

As per PROV-DM, an identifier is a qualified name in the same sense as in RDF/SPARQL.

Literal

PROV-DM literals include values that can be typed by XML Schema basic types, and thus can include URIs (unlike RDF's Literals). Thus, some care may need to be taken here when mapping PROV-DM literals that are URIs.

Characteristics of Object Properties

The table below summarizes the characteristics of the object properties that are defined in the OWL ontology. The question mark symbol is used to denote that the characteristic in question is under discussion by the WG (this is the case for alternateOf where an issue was raised to determine if it is transitive), or because I am not sure whether the property in question is supported bye the object property. I am also using (Yes) and (No) to denote properties that I am not sure of, but for which I am inclined to say yes or no. These also need to be discussed with the rest of the prov-o team. -- khalid

Questions/problems

The activity record is the only one that mentions additional things besides attributes. This seems odd.

Proposal: wasStartedBy, wasEndedBy and activity records / times are all under review in PROV-DM, so hopefully this will be addressed.

It isn't obvious whether we should emit a triple saying that the plan element of an activity is a <math>\texttt{prov:Plan}</math>. I guess this can be inferred if we omit it?

[Resolved: we give it explicitly]

In the rule for note, there is no class we can assign to the id. (The obvious idea of using rdfs:comment doesn't work because there's no separate class for the comments, and the range of rdfs:comment is Literal.) Is this a problem?

Proposed solution: add class prov:Note.

Resolved: PROV-O now has class Note

wasGeneratedBy has a time which can be linked to the generated entity by <math>\texttt{prov:wasGeneratedAt}</math>, but I think the time should be linked directly to the id.

Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>, define <math>\texttt{prov:wasGeneratedAt}</math> as the composition of <math>\texttt{prov:happenedAt}</math> and <math>\texttt{prov:hadQualifiedEntity}</math>.

Resolved: handling using atTime for now

used has a time and it's not obvious what this should be linked to in RDF and how. There is no relation for linking the used id to the time.

Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>.

Resolved: using atTime for now

wasStartedBy and wasEndedBy are treated as events (and they have id's and attributes), but there is no class for them.

Proposed solution: introduce <math>\texttt{prov:ActivityStart}</math> and <math>\texttt{prov:ActivityEnd}</math> as subclasses of QualifiedInvolvement.

Resolved: using prov:Start and prov:End

In hasAnnotation, should the attributes be connected to r or to n? Given that the note n can have arbitrary attributes, why does hasAnnotation have additional attributes?

Proposal: Suggest that DM consider dropping attributes on hasAnnotation and instead recommends subclassing Note to express different kinds of notes.

From PROV-O to PROV-DM

Given an instance of PROV-O, we want to compute an instance of PROV-DM that has the "same meaning".

The basic idea is:

For each node in the RDF graph, check whether the node is an instance of one of the PROV-O classes Entity, Agent, or Activity.

For each such node, look for the appropriate edges in the prov: namespace needed to fill in the fields of the corresponding PROV-DM record.

Any additional fields in other namespaces are added as attributes.

For each of the edges / graph patterns corresponding to PRO-DM relations, look for the corresponding data and generate the appropriate relation.