** Added comments to each class and property to use more specific forms when appropriate: http://dvcs.w3.org/hg/prov/rev/cde4a376c32e

** Added comments to each class and property to use more specific forms when appropriate: http://dvcs.w3.org/hg/prov/rev/cde4a376c32e

+

** http://aquarius.tw.rpi.edu/prov-wg/prov-o#wasInfluencedBy states "Because prov:wasInfluencedBy is a broad relation, its more specific subproperties (prov:wasInformedBy, prov:actedOnBehalfOf, prov:wasEndedBy, etc.) should be used when applicable."

+

** http://aquarius.tw.rpi.edu/prov-wg/prov-o#Influence states "Because prov:Influence is a broad relation, its most specific subclasses (e.g. prov:Communication, prov:Delegation, prov:End, prov:Revision, etc.) should be used when applicable."

+

** http://aquarius.tw.rpi.edu/prov-wg/prov-o#ActivityInfluence states "It is not recommended that the type ActivityInfluence be asserted without also asserting one of its more specific subclasses." (and similar for Agent/Entity Influence.

+

** http://aquarius.tw.rpi.edu/prov-wg/prov-o#Derivation states "The more specific forms of prov:Derivation (i.e., prov:Revision, prov:Quotation, prov:PrimarySource) should be asserted if they apply."

+

** http://aquarius.tw.rpi.edu/prov-wg/prov-o#wasDerivedFrom states "The more specific subproperties of prov:wasDerivedFrom (i.e., prov:wasQuotedFrom, prov:wasRevisionOf, prov:hadPrimarySource) should be used when applicable."

PROV-DM

ISSUE-532 (Role)

The Working Group has given careful consideration to roles in PROV. We considered allowing roles in all relations. A problem with such a permissive approach is that it is not clear who is playing the role. For instance, if a role is added to delegation, which agent is assuming that role: the delegate or the responsible?

For the example of delegation, the group suggests using the attribute "prov:type" if a subtype of the relation needs to be specified, e.g. contractual delegation.

In an earlier design, alternateOf and specializationOf had identifiers and attributes. Concerned by the proliferation of qualified relations, to keep things simple, the Working Group made the following resolution: alternateOf and specializationOf should be binary relations, for which the qualified pattern does not apply. So, alternateOf and specializationOf are not subrelations of wasInfluencedBy, and therefore do not have an ID and a list of attributes.

An extension of PROV-O could define a qualified pattern for these relations or an extension of PROV-N could use a term ext.specializationOf(id,e2,e1,attrs), whose interpretation would be application specific

Following the reviewer's further suggestion, the following was added to the document:

'Membership/Specialization/Alternate are not defined as Influence, and therefore do not have an id and attributes.'

It is correct that: A bundle is a named set of provenance descriptions (2.2.2). It is also correct that section 2.2.3 indicates that many types of collections exist, including sets. However, section 2.2.3 states: A collection is an entity that provides a structure to some constituents, which are themselves entities.

In PROV, provenance descriptions are not given identifiers and are not regarded as entities. Identifiers occurring in provenance descriptions denote "things in the world" (resources).

To be able to talk about the provenance of PROV descriptions, the bundle construct allows a set of descriptions to be named, and become a "thing in the world". Such a bundle is an entity whose provenance can then be described.

In conclusion, a PROV bundle is not a PROV collection.

In response to the follow-on message from the reviewer, there is no support in the Working Group, for adding identifiers to individual PROV statements, since they would result in a proliferation of identifiers. In practice, a given asserter would typically assert multiple statements: subject, relation, object. It feels more appropriate to give all these statements an id (by means of a bundle), and express their provenance.

It is not entirely clear what the semantics of the suggested wasAdoptedBy would be:

If it is a form of influence by which an agent was influenced by a plan, this can be expressed by a subtype of derivation wasDerivedFrom(ag,pl)

Alternatively, if it is an influence of the plan by the agent, this can be expressed by subtype of attribution wasAttributedTo(pl,ag)

If it is not an influence, a given application could define, in OWL terminology, a property chain wasAdoptedBy=agent o inverse(hadPlan)

The above discussion shows that PROV provides core building blocks that allow a relation such as wasAdoptedBy to be defined.

Hence, there is no need for a separate wasAdoptedBy relation.

Following the reviewer's follow-on message, the group has been very careful about introducing relationships that are not influence. Specialization/Alternate/Membership are special cases given their prevalence in provenance.

Other relations along the lines of "adopting a plan" could potentially be considered, such as "rejecting a plan" or "abandoning a plan". It feels that such a relation is not primitive but could be expressed in terms of an activity (to adopt, to reject, to abandon) and a used plan.

Suggested change: Replace To illustrate expanded relations, we consider the concept of association, described in Section 2.1.3. by To illustrate expanded relations, we revisit the concept of association, introduced in Section 2.1.3 (full definition of the expanded association can be found in section 5.3.3).

ISSUE-447 (subactivity)

The Working Group charter identified an initial set of concepts, and made it clear that the working group should not delve into the details of plans and workflows (called then recipe). The charter did not list a notion of subactivity either.

The Working Group considered a notion of subactivity, but does not understand the implication of introducing such a relation to the model. In fact, there is little prior art about this in the provenance community. There is also concern that specifying such a relation would overlap with some workflow specification initiatives.

For this reason, the Working Group decided not to provide a normative definition of such a relation. Instead, the Working Group suggests that a relation such as dcterms:hadPart could used by applications, which would be responsible for ensuring its use is consistent with the model.

The Working Group intends to produce an FAQ page illustrating how such a construct could be used.

ISSUE-508 (Table 5)

The text indeed required clarification: "core structures have their names and parameters highlighted in bold in the second column (prov-n representation); expanded structures are not represented with a bold font."

Indentation of subconcepts had been considered by the editors. While it appears beneficial to see Revision, Quotation, and Primary Source indented below Derivation, this would lead to confusion elsewhere in the table:

Plans (in component 3) are subtype of Entity, but entities belong to component 1. Indenting Plan under another concept would therefore be misleading.

Person/Organization/SoftwareAgent could be indented below agents. However, our preference is to list core structures first, before expanded structures.

Finally, Influence could be see as super-relation of many relations, but, again, they are spread across components, and Influence is regarded as an expanded structures.

Overall, there are multiple, conflicting ways of organizing table 5. We feel that this order of structures allows components to be exposed and core structures to be presented first, without attempting to expose a hierarchy of types, which would require an entirely different layout.

PROV-DM follows the syntax specified by PROV-N. Regarding the style of encoding of attributes, this issue is already raised against the PROV-N document (issue-533).

Proposed changes: add the following to the location section: "While several objects are allowed to have a Location, it may not make sense to use it in some cases. For example, an activity that describes the relocation of an entity will have start and end locations, as well as every place in between those points."

Inference 12 of prov-constraints states that the two entities linked by a revision are also alternates.

Given this, can we have wasDerivedFrom(e2,e1,[prov:type='prov:Revision']) and wasDerivedFrom(e2,e1,[prov:type='prov:Quotation'])?

From the first statement, one can infer alternateOf(e2,e1)

From the second statement, e2 contains a copy of something (text/image) contains in e1

We acknowledge the reviewer's follow-on comment. Ultimately, the Working Group provides a vocabulary, and users of that Vocabulary will have to make judgement as to which constructs they have to use. This issue is not specific to revision and quotation. (e.g. What is entity? what is activity? Is this specialization or alternate? etc)

ISSUE-501 (DrivingACarToBoston)

As the author suggests, "driving a car to boston" is an example, and therefore, needed to be put in a box.

It now appears as example 5, following examples 4 and 3 containing examples of usage and generation of digital entities.

The author's comment confirms it is important to include this example in this document. Indeed, the users states that "most people would consider a single entity (not two or more)", whereas prov modelling requires the "car at various locations" to be seen as different entities.

In general, we have been careful to limit the number of examples involving physical entities, however, including a few is important to demonstrate the generality of the model.

ISSUE-516 (DerivationAsBundle)

A derivation is not an activity, a derivation is a transformation of an entity to another. A derivation may be realized by one or more activities.

If a derivation (between e2 and e1) is realized by one known activity, then that activity generated e2 and used e1.

All this is formalized in the constraints document (see references).

The reason why derivation can refer to a usage and a generation is that we wanted to be able to express the derivation path in full. This is particular important in a number of use cases, including result reproducibility.

So, derivation is a construct that refers to two entities, an activity (similarly to other relations in the model) and in addition to a usage and a generation, by means of their identifiers. (Reminder: these identifiers identify entity/activity/usage/generation and not statements).

A bundle is a set of provenance statements. (Reminder: statements do not have identifiers.)

Hence, a derivation is not a bundle, it does not contain statements.

Following the response of the reviewer, a change was implemented there must be some underpinning activity or activities performing the necessary action(s) resulting in such a derivation

"Is it possible for entities to become temporarily unavailable (e.g., for usage)? "

prov-constraints states that usage precedes invalidation and follows generation. Hence, for a given entity, one cannot express that usage is not permitted over a period of time.

Alternatively, one can introduce multiple specializations for the various intervals. In the following example, one defines an entity e, and two specializations. e1 is not available after 10am, and e2 before 4pm on 2011-11-16. Both e1 and e2 have an attribute ex:available indicating their availability. On the other hand, e does not have such attribute, because this aspect is not fixed during the lifetime of e.

The above example shows that e has some aspects that remain constant during its lifetime (e.g. its identity), but is also allowed to have other aspects that change over time. These changing aspects cannot be expressed as attributes.

There is no requirement for asserters to assert invalidation of entities

Given this, the Working Group feels that the concern raised by the author is not applicable. Entities may have long lifespan, provided that they have some aspects, represented as attributes, that do not change over that lifespan. Other aspects are allowed to change. As a minimum, an entity must have a fixed identity during its lifetime.

As far as a new section on state is concerned, the Working Group has made a decision to leave this kind of material outside the prov-dm document. Some of this is actually covered in prov-constraints.

In the follow-on message, the reviewer discusses the traffic light example. As the light changes from red to green, the green traffic light is invalidated and the red traffic light is generated. Both are specializations of the traffic-light, which continues its existence across this change state, since color is not one of its attributes.

The group has given careful considerations to attributes in prov-dm, specifically time, location and role.

The group could not reach consensus to allow these attributes to apply to more concepts of the data model. The challenge is not to add the attribute to a concept, but to find an interpretation of that attribute, which fits the rest of the model.

Role:

We have already elaborated on roles in our response to ISSUE-532.

Location:

While a notion of location is fairly intuitive for an activity or entity, it is less intuitive for associations for instance. In an association, the activity may have a location, and the agent may have a location. It is however unclear what the location of the association itself may be.

Time:

The same comments apply for time. However, in this case, the constraints document explains what kind of ordering constraints exist, between an agent and activity, for instance.

Furthermore, as expanded in details in prov-constraints, time information is connected to a unique event. The Working Group has not defined, for instance, an event for the start of an association, and an event for its end. It is not clear why such event types would be required, when activity start and end could be used to that end, and the association be represented by an activity, holding for some time interval.

So overall, the group could not find consensus to broaden these attributes to other relations in a meaningful manner. Particular implementations, using the PROV extension mechanism, are however able to add similar attributes for their specific needs.

In response to the follow-on message, the Working Group, as it wraps up its activities, will consider follow-on activities, and mechanisms for community to share information. The Semantic Web wiki is a starting point.

ISSUE-520 (Person/Organization/SoftwareAgent)

The reason why the WG introduced agents in the PROV model is to be able to assign responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

For inter-operability reason, the WG also believed it is useful to define commonly encountered types of agents: Person, SoftwareAgent, and Organization. Agents of type prov:Person are people responsible for something; agents of type prov:SoftwareAgent are running software responsible for something; etc

The reason why an instance of prov:Agent is allowed to be also a prov:Entity is because we may want to talk about its provenance, how it was generated or derived, etc.

Given this:

it is not appropriate to make Person/SoftwareAgent/Organization subtypes of Entity in PROV, since entities by default do not bear responsibility in the PROV model. It is the notion of prov:Agent that carries responsibility, in PROV

it is possible to define an instance as both a prov:Person and a prov:Entity, when we want to express it is responsible for something, and we want to express its provenance.

If one wishes to introduce a type of person, as an entity, without associating any responsibility, then there are ontologies, outside PROV, which allow for that. FOAF concepts such as foaf:Person, foaf:Organization may be relevant. With these, one can write entity(e, [prov:type='foaf:Person'])

PROV delegations are not temporal relations. Instead, prov-constraints define ordering constraints that are implied by delegations: the responsible agent has to precede or has some overlap with the subordinate agent.

If in an application, it is necessary to express that a delegation takes place over an interval(evt1-evt2) and followed by a delegation during interval (evt2-evt3), a possible way to model in PROV is as follows:

One may model this scenario with two activities, one for the first interval, or one for the second interval, and two relations actedOnBehalfOf, one for each activity.

It is true that, in a delegation, activity is optional. The reviewer suggests "Therefore, it is possible to state that one agent is the delegate of another, irrespective of any activity. This delegation likely is not indefinite, however, and is bounded by some context (e.g., time, role within an organization, etc). It should be possible to describe the bounds of the delegation.". But it is not the intended semantics:

PROV constraints defines the semantics of optional arguments, and specifically, in Table 3, explains that activity in delegation is expandable.

It means that an absent activity can be replaced by an existential variable. Hence,

actedOnBehalfOf(ag2,ag1) really means that agent ag2 acted on behalf of agent ag1 in the context of some unspecified activity. Some activity, not all activity.

This (unspecified) activity defines the bounds of the delegation. If these bounds need to be made explicit, than an activity also needs to be made explicit.

ISSUE-509 (AttributesInUML)

First, let us note the non normative nature of the UML diagrams. They are here to inform readers, and convey the intuition of the data model

The UML actually represent all the information present in relations such as WasStartedBy.

PROV Id and PROV attributes are explicitly listed as UML attributes in the association class

The started activity and the trigger entity are source and destination of the association edge

The starter activity is present with the starter edge

Time is also present though the time edge

With UML diagrams, we can take a full object oriented view or a more relational view of the data model. The former lists all attributes, whereas the latter highlights the relations. We opted for the latter approach.

Hence, what the UML diagram does not explicit represent is the actual names of all attributes of a relation. That is covered by the normative text.

It is correct that Time is a primitive datatype, and marked as such. Given the important of time and events in the model, it is considered pedagogical to keep it in Figure 5. We note that Figure 1, the much simplified version, doesn't show it.

Finally, it's correct that we use names such as Start, but the UML diagram contains relation label WasStartedBy. This has now been fixed for all introductory paragraphs.

The group has already addressed issue-525 related to Specialization/alternate

Note that alternateOf is a necessarily very general relationship that, in reasoning, only tells you that the two alternate entities fix different aspects of some common thing (possibly evolving over time), and so there is some relevant connection between the provenance of the alternates. In a specific application context, alternateOf, or a subtype of it, could allow you to infer more.

The prov-constraints document provide further information about alternate.

In section 2: A different entity (perhaps representing a different user or system perspective) may fix other aspects of the same thing, and its provenance may be different. Different entities that fix aspects of the same thing are called alternates, and the PROV relations of specializationOf and alternateOf can be used to link such entities.

Furthermore, alternate is defined as an equivalence relation (section 4.5).

Following the reviewer's request the sentence above was added to the document.

The focus of derivation is on connecting a generated entity to a used entity. Hence, transformation of an entity into an other, or updating of an entity from another are appropriate focus for this definition.

One should note that the focus is not of creation of the entity since we already have the notion of generation for that.

Given an entity that was generated, the concept of derivation allows us to express dependencies on entities that have influenced that entity. As the author suggests, it could be argued that most entities can be said to be derived from other entities.

In PROV, the creation of an entity, referred to as generation, is the point after which it becomes available for usage. Before generation, the entity cannot be used.

The document gives the example of a car, moved from Boston to Cambridge (see example 5, in editor's draft). For this car, we identify multiple entities exposing various facets of the thing: Joe's car, Joe's car in Boston, and Joe's car in Cambridge.

Joe-car-cambridge begins to exist when the car arrives is Cambridge, and Joe-car-boston ceases to exist (invalidation) once it leaves Boston. So joe-car-cambridge's generation time is defined as the time at which in arrives in Cambridge.

Following the reviewer's request, the clarifying sentence was added to the document.

Given this, prov-dm should define the minimalist characteristics for wasInfluencedBy in a technology agnostic way.

Inheritance is a way of implementing Inference 15 of prov-constraints (and this approach was successfully followed by prov-o), but it does not have to be implemented that way. For instance, a rule based system could simply implement Inference 15 without requiring inheritance. The current prov-xml schema does not define WasGeneratedBy as an extension if Influence. A record based system may not rely on inheritance.

As the author suggests, inheritance would imply that attributes are inherited by the children relation. It is not the case that wasGeneratedBy has influencer/influencee attributes, but instead, we want to show that they correspond to activity/entity in that case.

Given this, the document should be changed as follows:

The UML diagram in Figure 8 should not show a Generalization association between WasGeneratedBy (and others) and WasInfluencedBy.

A table should be introduced showing which attributes in Generation/Usage/etc are influencer or influencee.

With these changes, the issue raised by the author is no longer applicable: it is no longer the case that wasGeneratedBy etc can be used anywhere between agent/activity/entity.

For the comment "The notion of influence is useful for the PROV model, but it is unclear whether this is intended to represent an extension point for adopters of the spec. How should it be implemented?", we have shown with prov-o, prov-n, and prov-xml various ways of implementing Influence. According to Section 6, Influence is not seen as an extensibility point of the model, instead, it is seen as a means to express influence in PROV without being specific about its nature. We note the following, quoted from the specification:

It is recommended to adopt these more specific relations when writing provenance descriptions. It is anticipated that the Influence relation may be useful to express queries over provenance information.

Given this, it is legal to write the following, in which a2 acted on behalf of a1, where a2 and a1 are activities, but the type of a2 and a1 can also be inferred to be agent. Hence, the response to the author's question "Can activities be responsible for other activities" is yes, as illustrated by the example.

activity(a1)
activity(a2)
actedOnBehalfOf(a2,a1)

The group has provided its answer to ISSUE-503, indicating that there is not a notion of "adopting a plan" (beyond a plan able to be identified in wasAssociatedWith) and there is no need for a separate wasAdoptedBy relation. Hence, the question "can activities adopt a plan" is not applicable.

PROV associations are not temporal relations. Instead, prov-constraints define ordering constraints that are implied by associations. The agent in an association is expected to have some overlap with the activity. Likewise, for attribution, the agent exist before this entity was generated.

If in an application, it is necessary to express that an activity is associated with agent ag1 during interval (evt1-evt2) and then with agent ag2 during interval (evt2-evt3), the approach is to model this with two activities, one for the first interval, or one for the second interval.

ISSUE-482 (Bundles and IDs)

PROV specifications define a notion of bundle, but do not define operations on bundles such as merge. The definition of such operations is left to implementations.

The prov-constraints document defines a notion of validity in the presence of bundles. Validity is determined by checking validity of bundles, individually, irrespective of other existing bundles. For instance, the following document, containing two bundles is valid.

Other specifications may provide some guidance regarding this issue. For instance, the Architecture of the World Wide Web, Volume One, provides principles, constraints, and good practice notes about the use of IRIs.

Given the above, PROV by itself does not require IDs to be unique in a bundle, but one may have to ensure this in order to perform certain operations on the PROV data or to meet other best practice.

ISSUE-518 (PrimarySource)

Following the author's suggestion the Working group proposes to revise the definition of Primary Source as follows:

A reference to a primary source indicates a derivation from an entity that was produced by some agent with direct experience and knowledge about the entity's conceptual topic, at the time of the topic's study, without benefit of hindsight.

We also propose to add the following comment, inspired by this issue:

It is also important to note that a given entity might be a primary source for one entity but not another. It the reason why Primary Source is defined as a relation as opposed to a subtype of Entity.

ISSUE-499 (Generation vs Activity)

The author states It is not clear why it is necessary to define terms for discrete points in time within the PROV model. If activities already have start and end times, isn't that sufficient?.

As indicated in prov-constraints, PROV is implicitly based on a notion of instantaneous events. Five of them are identified, start/end/generation/usage/invalidation.

These events are of interest because they mark a "change of state" in the world: an activity is started/end, an entity is generated/used/invalidated. These types of events matter because they enable or disable the occurrence of further events. For instance, before generation, an entity cannot be used, but it can after its generation, ... until its invalidation.

Those events always involve an activity and an entity:

start and end of an activity with respect to a trigger

generation/usage/invalidation of an entity by an activity.

Each type of event enables or disables the occurrence of specific types of events:

Start of a:

No event with a can precede start of a, event with a can follow start of a

End of a:

Event with a can precede end of a, event with a cannot follow end of a

Generation of e:

Event with e cannot precede generation of e, event with e can follow generation of e

Invalidation of e:

Event with e can precede invalidation of, event with e cannot following invalidation of e

Usage of e by a:

"influence" of e can "show" after usage by a, but cannot "show" before usage

Given the different types of events, it is not sufficient to have just start and end events, as suggested by the author.

In PROV activities "occur". They do "stuff". They act upon and with entities. The activities are involved in the generation and usage of activities: as indicated above, an event always occurs in the context on an activity.

If, for some application, it is useful to see the creation of entities as having a duration, this indeed can be modelled by an activity with a duration. But what we care about, from a provenance viewpoint, is when the entity is actually created, which we then refer as generation. This cannot be modelled by an activity. The generation (event) is in the model the relation between an activity and an entity.

To avoid potential confusion between activity and start/end/generation/usage/invalidation, we now make explicit that start/end/generation/usage/invalidation are instantaneous.

ISSUE-529 (Empty Collection)

In an open world context, absence of the relation hadMember(c,e) does not imply that a collection c is empty. Hence, the group introduced a class EmptyCollection to indicate when a collection is empty.

Figure 11, like all UML diagrams, is informative. It shows that Collection and EmptyCollection are linked with Entity, by means of a Generalization association. Therefore, a Collection and EmptyCollection are also entities with an id and attributes.

Concretely, prov-dm (prov-n) sees all the sub-types (e.g. prov:type='prov:Collection' ) as type information that is expressed by the prov:type attribute.

The handling of these subtypes is consistent with other subtypes in the model, e.g. revision, softwareAgent, etc

Prov-dm, as a conceptual model, leaves the implementation of these inherited types to concrete serializations.

As to the question of why doesn't PROV-DM have a list of members as an attribute of Collections, the design of prov-dm makes all associations between PROV entities relations. In effect, this allows us to understand the structure of a provenance graph, just by looking at the relations, without having to process attributes of entities. A given implementation may also to decide to represent collection members as attributes if it finds it convenient.

References:

Implemented changes:

Changed the text to indicate that PROV defines no collection specific attributes.

ISSUE-462 (Definition of Entity)

The term 'entity' is intentionally defined in a liberal manner to avoid restricting users expressivity. Obviously, it shouldn't be too liberal, otherwise it would be all encompassing, without clear semantics.

The term 'entity' (and associated notions such as 'alternate', 'specialization') have been the subject of intense debate by the Working Group, and the definition reflects the compromise reached by the Working Group.

The term 'aspect' is not used here with a technical meaning and should be understood with its dictionary meaning 'A particular part or feature of something'.

PROV-Constraints, in its rationale section, expands on the notion of entity.

While an object/thing may change over time, an entity fixes some aspects of that thing for a period of time (in between its generation and invalidation). As opposed to other models of provenance (such as OPM), an entity is not an absolute state snapshot. Instead, it is a kind of partial state, just fixing some aspects. The rationale for this design decision is that it is quite challenging to find absolute state snapshots that do not change: the location of a file on a cloud changes, the footer of this Web page changes (as more people access it), etc. Hence, by allowing some aspects (as opposed to all) to be fixed, the PROV concept of 'entity' is easy to use.

We distinguish an 'aspect' from an 'attribute'. An attribute-value pair represents additional information about an entity (or activity, agent, usage, etc). In the case of an entity, attribute-value pairs provide descriptions of fixed aspects. So, the term 'aspect' refers to properties of the thing, whereas the term 'attribute' refers to its description in PROV.

PROV does *NOT* assume that all fixed aspects are described by attribute-value pairs. So, there may be some fixed aspects that have not been described. Obviously, without description, it's difficult to query or search over them.

According to PROV Constraint key-object (constraint 23), an entity has a set of attributes given by taking the union of all the attributes found in all descriptions of that entity. In other words, PROV does not allow for different attribute-value pairs to hold in different intervals for a given entity.

The attribute-value pairs of an entity provide information for some of the fixed aspects of an entity.

This point may not have been clear, and requires text modification. (see below)

A specific attribute of an entity is its identity. It is also assumed that it holds for the duration of the entity lifetime.

This point may not have been clear, and requires text modification. (see below)

ISSUE-498 (Relation terminology)

PROV-DM is a conceptual model, whose core contains the following concepts: Entity, Activity, Agent, Generation, Usage, Communication, Derivation, Attribution, Association, Delegation. Each concept is named with a noun. They are all listed in the overview table: http://www.w3.org/TR/prov-dm/#overview-types-and-relations

Overall, this issue has already been addressed, through ISSUE 409 and 502:

In the response to issue 502, the Working Group has confirmed that it sees a derivation as a transformation.

Likewise, in the response to issue 499, the Working Group has confirmed that generation is not an activity.

ISSUE-569 (Mutable resources)

PROV supports the case you describe using the prov:specializationOf relation to connect a mutable resource URI to entities representing each revision over time. The latter don't have to exist already in Callimachus, but may be created with unique IDs specifically to model the provenance.

If a change in a resource's state is something to be documented in the provenance, then that requires multiple entities. PROV entities are allowed to be mutable, but the purpose of this is to hide information that is unimportant, i.e. that you do not want to model in the provenance. As soon as the timeline of the resource is divided into relevantly different periods (e.g. before and after each contributor edited), then the mechanism to document this in PROV is to use multiple entities. If you have a single identifier (entity) for the mutable resource as it exists over time, through multiple revisions, this can be connected to the set of revision entities using the prov:specializationOf relation.

The flour and baking example is similar. If a change is to be documented in PROV, then multiple entities are used, e.g. the flour before and after baking. If it is not documented, then only one entity is required. There is no notion of a change which is "documented but not significant", because it is unclear what significance would be in general except for the decision to model/document it. As before, a general, mutable "flour" entity can exist that is connected to the flour before and after baking using prov:specializationOf. For example:

The feedback was broken down in individual issues that were addressed separately on this page. The group thanks for the reviewer for the extensive comments!

The group made changes based on the reviewers feedback, please see each issue for the relevant change.

The UML diagrams in PROV-DM are informative. They are intended to illustrate concepts as best as possible. The normative material is found in the text. There may be alternative UML modelling of the same normative definitions.

Alternative UML diagrams were proposed by the author of this feedback. Individual issues have addressed these points, but below we provide specific feedback on some UML diagrams.

Some comments on the UML diagrams provided by the reviewer:

Entities.png: Organization, Person, Software are not entities (ISSUE-520), Bundles are not Collections (ISSUE-524), and membership is expressed as a relation and not an attribute (ISSUE-529)

Interactions.png: we did not find it suitable to introduce a role (generatedEntity/UsedEntity) since then we would have to introduce a different identifier for the entity in that role. This would result in very convoluted graphs, with lots of 'acts as' relations. There is no startTime/endTime for Invalidation, Usage, Generation, but simply a time. A strong desire has been to facilitate the assertions of provenance: ex:a2 prov:used <uri> and <uri> wasGeneratedBy ex:a1

Relations.png: Alternate/Specialization/Membership do not have id and attributes. Adoption is not a PROV relation. PROV does not define activity composition.

PROV-DM(Under Review)

ISSUE-475 (Mention)

The reviewer suggests that the work to describe contextualized provenance should be deferred so that it can be aligned with ongoing W3C work on RDF datasets and their semantics. Since ISSUE-475 was submitted, the RDF working group has decided that it will not provide a formal semantics for RDF Datasets. This RDF resolution ensures that any semantics for bundle and/or mention is guaranteed not to be in conflict with the RDF semantics.

As PROV-Constraints section 6.2 clearly indicates, PROV-bundles validity is determined by examining bundles in isolation of each other. Our response to issue-482 also indicates that PROV itself does not set any constraints on how a given ID is being used across multiple bundles. Given this, mentionOf is a general relation which allows an entity to be linked to another entity described in another bundle.

The reviewer suggests that

mentionOf(infra, supra, b)

could simply be expressed as

specializationOf(infra, supra)
entity(infra, [mentionedIn=b])

This design was considered and rejected by the Working Group:

By design, relations between PROV objects are expressed by PROV relations (usage, generation, etc, mention), and are not expressed as PROV attributes. The suggested additional attribute mentionedIn would relate the entity infra with bundle b, and would go against this prov-dm design.

The interpretation of the attribute-value pair mentionedIn=b is somewhat difficult, because infra is not itself described in bundle b: supra is the entity described in bundle b. So, syntactically, mentionedIn=b may look like an attribute-value pair, but in reality, it can only be understood in the presence of specializationOf(infra, supra). Hence, the reason for introducing the ternary relation mentionOf.

The Working Group left it unspecified which new attributes could be inferred for infra, and in general what constraints apply to mentionOf. The reviewer is critical of this decision, arguing that nothing new can be inferred from mentionOf, and therefore mentionOf can be replaced by specializationOf. 'Under-specification' is a feature of PROV: what can be inferred from relations such as usage, derivation, alternate? The group recently acknowledged this for alternateOf and added a clarifiying note in the text. This observation is applicable to further PROV concepts, such as Quotation, PrimarySource, SoftwareAgent, etc. which do not allow us to infer more than their parent concept would (Derivation, Agent). We are in a same situation with mentionOf. Further inferences are left to be specified by applications.

The reviewer's suggestion to address the use of Example 45 is to copy part of the referred bundle. By copying statements from the original context to the new context, we have lost the original context in which they occur (... their provenance!), and we have no way of expressing that wasAssociatedWith(ex:a1, ...) in the new context is a "kind of specialization" of wasAssociatedWith(ex:a1,...) in the original context, ... which is why mentionOf was introduced in the first place.

The reviewer also comments on the lack of information about 'Fixed aspects'. We refer to our response to ISSUE-462, and recent associated changes to the document.

The Working Group identified 'mention' as a feature at risk, because it was seeking experience from implementers. The Working Group will keep this feature marked at risk as it enters the CR phase, and will reassess its suitability based on implementers feedback.

ISSUE-543 (Key-Value)

The terminology "key-entity pair" was actually intended, instead of key-value pair. Example 43 illustrates an extension of collections, but the text was not clear. It is now changed as follows: "The following example shows how one can express membership for dictionaries, an illustrative extension of Collections consisting of key-entity pairs, where a key is a literal."

This example of extension of collection is inspired by a notion of dictionary, defined in a separate document.

Keys are defined as literals, and therefore follow the production. Some literals, such as ints, don't need to be quoted. Strings are double-quoted, qualified names are single quoted.

ISSUE-537 (Syntax of identifiers)

There seems to be a misunderstanding of the notation (or the author refers to a previous version of the document).

According to the derivation production, there is one interpretation for wasDerivedFrom(d,e)

d: is an identifier denoting the generated entity

e: is an identifier denoting the used entity

If one wishes to identify the derivation, then the optionalIdentifier production should be used, requiring the optional identifier to be separated with a ; (a distinct separator as suggested in the comment)

ISSUE-533 (Named Attributes)

There is no right or wrong approach, there are essentially two different philosophies. Either we adopt a named attribute approach as suggested in the feedback, or we go for a positional attribute solution.

ISSUE-552 (Influence subclasses)

The WG agrees with the suggestion that the phrase "a particular case of derivation" should be expressed using rdfs:subClassOf.

Since the prov-dm's definitions for revision, quotation, and primary source mention that they are "particular case[s] of derivation", then it follows that each should be subclasses in the PROV-O encoding. We changed PROV-O to include these three classes as a subclass of Derivation.

The WG agrees with the reviewer that "a kind of" is a more natural phrasing than "a particular case", and so we have adopted it as suggested.

On the phrasing of definitions:

It was pointed out that the definitions for "{Entity,Agent,Activity}Influence" are inconsistent with that of their parent class "Influence".

The source of this inconsistency is that {Entity,Agent,Activity}Influence are not defined by prov-dm, but by prov-o as artifacts of encoding prov-dm's model into the paradigm of OWL (i.e., the use of the qualification pattern to describe binary relations).

The inconsistent definitions were "demoted" to rdfs:comments because they focus too heavily on the RDF and OWL paradigm and not enough on how they are expressing the abstract model of prov-dm.

New definitions were created to align with their parent class, with a focus on how the classes are expressing the abstract model of prov-dm.

On the inconsistency of subclasses according to "general understanding of the english terms":

The reviewer points out that the definitions of Influence, EntityInfluence, and Start illustrate an inconsistency: "influence is a capacity, an entity influence is a provider (of descriptions) and a start is a "when" (a time)".

The WG acknowledges that the definitions as shown support this concern.

The inconsistency between Influence and its immediate subclasses {Entity,Agent,Activity}Influence is addressed by the response to the earlier comment ("phrasing of definitions").

To address the inconsistency between {Influence, {Entity,Agent,Activity}Influence} and {Start,End}, PROV-DM updated the definitions for Start and End:

Start is when an activity is deemed to have been started by an an entity, known as trigger . The activity did not exist before its start. Any usage, generation, or invalidation involving an activity follows the activity's start. A start may refer to a trigger entity that set off the activity, or to an activity, known as starter , that generated the trigger.ref

End is when an activity is deemed to have been ended by an entity, known as trigger . The activity no longer exists after its end. Any usage, generation, or invalidation involving an activity precedes the activity's end. An end may refer to a trigger entity that terminated the activity, or to an activity, known as ender that generated the trigger.ref

prov-o "demoted" the original definitions of {Entity,Agent,Activity}Influence to rdfs:comments.

prov-o created new definitions for {Entity,Agent,Activity}Influence to align with their parent class definition.

prov-o removed existing comments on {Entity,Agent,Activity}Influence that were very similar to the new "prov-dm centric" definitions. The removed comments had more of an OWL flavor to them instead of an abstract flavor. For example, the following comment was removed:

"ActivityInfluence is intended to be a general subclass of Influence of an Activity. It is a superclass for more specific kinds of Influences (e.g. Generation, Communication, and Invalidation)." in favor of the definition "ActivitiyInfluence is the capacity an activity to have an effect on the character, development, or behavior of another by means of generation, invalidation, communication, or other."

The latest draft of the PROV-O html document reflects the definitions changed in the PROV-O OWL file:

ISSUE-491 (prov:agent)

The redundant and confusing rdfs:comments were removed from prov:{entity,activity,agent}

Editor's definitions were added to prov:{entity,activity,agent} in place of reusing the definition from prov:{Agent,Entity,Activity}Influence, e.g.:

The prov:agent property references an prov:Agent which influenced a resource. This property applies to an prov:AgentInfluence, which is given by a subproperty of prov:qualifiedInfluence from the influenced prov:Entity, prov:Activity or prov:Agent.

Most examples shown in this cross reference are encoded using the Turtle RDF serialization. When an example requires a prov:Bundle, it may use the [TRIG] syntax. Although this document does not specify how to encode Bundles in RDF, TriG's named graph construct is used only to illustrate the concept.

PROV-CONSTRAINTS

ISSUE-556 (time-qualification)

Summary: Are there missing constraints to relate qualified and unqualified start/end times?

Group response:

PROV-CONSTRAINTS defines constraints in terms of the abstract syntax of PROV-DM.

The group has decided not to explicitly specify the mapping from PROV-O representations to PROV-DM and back (although there is a partial, but not up-to-date, alignment at http://www.w3.org/2011/prov/wiki/ProvRDF), but this might be done in the future or as a Note.

The group has also decided not to specify the constraints using OWL explicitly, but this might also be done in the future or as a Note.

It appears natural that constraints such as the author proposes will be needed to apply constraints to PROV-O documents directly, but this is outside the scope of the specifications.

ISSUE-586 (toplevel-bundle-description)

Summary: The description of 'toplevel instance' as 'set of statements not appearing in a bundle' is unclear

Group response:

This is not a formal constraint; this description is potentially misleading, since it is allowed for multiple copies of the same statement to appear in toplevel instance and bundles.

References:

Changes to the document:

Clarify description of "toplevel instance" to just say that there is a toplevel instance and possibly some named instances, called bundles, and they are all treated independently for the purpose of validity checking (so presence or absence of statements in one instance never affects the validity of another).

For terms, "merging" is exactly unification in the usual first-order logic / logic programming sense, as we state in a remark. For predicates that carry attribute lists, things are more complicated since key constraints require the attribute lists be combined, not unified in the usual sense.

References:

Changes to the document:

Use "unification" for "merging" at the level of terms

Declaratively describe unification as producing "either failure or a substitution that makes both sides equal", as well as giving the (standard) algorithm

Retain "merging" for the nonstandard operation on predicates that unifies the term arguments and concatenates the lists of attributes.

ISSUE-579 (declarative-fol-specification)

Summary: Suggestion to replace procedural specification with (equivalent, but shorter and less prescriptive) declarative theory in First-Order Logic

Group response:

PROV-CONSTRAINTS intentionally reuses as much of standard techniques from logic and particularly database theory as possible. However, our audience (as reflected by the composition of the WG) is not expected to be familiar already with first-order logic, so we felt it was important to elaborate upon these concepts sufficiently that someone without background in these areas can implement it.

Moreover, writing an arbitrary FOL axiomatization has its own problems: since there is currently no standard way to do this we would have to restate a lot of the standard definitions in order to make the specification self-contained (as we have already done). In addition, an arbitrary FOL theory is not guaranteed to be decidable, even over finite models. We resolved that the constraints document had to demonstrate decidability/computability, as a basic prerequisite for implementability. Simply giving a set of FOL axioms on its own would not be enough to do this, and would leave (the vast majority of) implementors not familiar with FOL theorem proving/databases/constraint solving at sea with respect to implementation.

Thus, this issue is deferred to the planned PROV-SEM note.

References:

Changes to the document:

PROV-CONSTRAINTS updated to clarify that a declarative alternative is deferred to PROV-SEM

Add non-normative material PROV-SEM giving a FOL axiomatization, proof of soundness/completeness with respect to the algorithm in the spec and soundness with respect to the draft model-theory in the current draft of PROV-SEM.

ISSUE-585 (applying-satisfying-constraints)

Summary: Suggestion to avoid discussing how to 'apply' definitions, inferences and constraints; the term 'satisfies' is not adequately defined in the context of PROV-CONSTRAINTS

Group response:

As noted in the response to ISSUE-579, we disagree that rewriting everything in terms of pure first-order logic would lead to a satisfactory specification (as opposed to a satisfactory research paper, say). The goal of the non-normative section here is essentially to link the (implicit) declarative semantics of the first-order theory, which we described informally earlier, with the procedural way in which normalization handles this behavior. This is exactly analogous with an operational, or proof-theoretic approach to the semantics of logic programming, which is equally correct compared with a declarative, denotational semantics; we simply chose to present the approach that lends itself more immediately to efficient implementation.

We inadvertently used "satisfies" as as shorthand for "passes all constraint checks without generating INVALID". This will be clarified.

ISSUE-583 (equivalent-instances-in-bundles)

Summary: Questions concerning what it means for applications to treat equivalent instances 'in the same way', particularly in bundles.

Group response:

Since validity and equivalence are optional, this is not a formal requirement, but a guideline; what it means for an application to treat equivalent instances/documents "in the same way" is application specific, and there are natural settings where it makes sense for an application (evenone that cares about validity) to have different behavior for equivalent documents. We give one example of formatting/pretty printing. You give some additional examples; digital signing is a third. Because we have no way of circumscribing what applications might do or what it means for an application to treat documents "in the same way", we just leave this as a guideline.

References:

Changes to the document:

Clarify that the suggestion that applications SHOULD treat equivalent instances 'in the same way' is a guideline, and depends on what 'in the same way' means for a given application.

ISSUE-580 (drop-syntactic-sugar-definitions)

Summary: Suggestion to drop definitions in section 4.1 since they are not needed if the semantics is defined more abstractly

Group response:

This is actually an orthogonal issue to the style of semantics; PROV-DM and PROV-N nowhere specify how missing arguments are to be expanded to the "PROV-DM abstract syntax" (which itself is not explicitly specified in PROV-DM). You're correct that Definition 1 (which expands short forms) is in a sense implicit in PROV-DM, which only discusses the long forms and their optional arguments, but it isn't said explicitly in either PROV-DM or PROV-N how the PROV-N short forms are to be expanded to PROV-DM. Furthermore, Def. 2-4 deal with special cases concerning optional/implicit parameters which are not explained anywhere else. We recognize that there is a certain amount of PROV-N centrism in these definitions, but since PROV-N is formally specified and the abstract syntax is not, we feel it's important to make fully clear how arbitrary PROV-N can be translated to the subset of PROV-N that corresponds to the abstract syntax of PROV-DM. This is to ensure that there is no room for misinterpretation among multiple readers, who may expect different conventions for expansion/implicit parameters (even if the rules we specified seem "obvious").

References:

Changes to the document:

Add a note clarifying the relationship between PROV-DM "abstract syntax" and PROV-N, and why the definitions are needed for this mapping.

ISSUE-577 (valid-vs-consistent)

Summary: 'Valid' is used differently from its usual meaning in logic; 'consistent' would be a better term

Group response:

We would like to clarify that we are not attempting to define a semantics (in the sense of model theory or programming language semantics) for PROV in PROV-CONSTRAINTS. We may do this in a future version of PROV-SEM, by giving a first-order axiomatization that is sound with respect to the model theory that is in the current draft of PROV-SEM.

PROV-CONSTRAINTS defines a subset of PROV documents, currently called "valid", by analogy with the notion of validity in other Web standards such as XML, CSS, and so on. While concepts from logic are used, it is not intended as a logic or semantics.

We agree that it would be preferable to avoid redefining standard terminology from logic in nonstandard ways, and you are correct that "valid" means something different in logic than the sense in which it is usually used in Web standards. However, since we expect our audience to consist of implementors and not logicians, on reflection we prefer the terminology "valid"/"validation" over "consistent"/"consistency checking".

References:

Changes to the document:

Clarify (sec. 1.2) that our notion of "valid" is named by analogy to other W3C standards, such as CSS and XML, and that in logical terms it is "consistency"

ISSUE-578 (equivalence)

Summary: Use of "equivalent"; incompatibility with common uses of the term in logic/mathematics

Group response:

This issue was discussed within the group already, and we could not come to an agreement on how equivalence should behave on invalid instances. Therefore, we decided not to define equivalence on invalid instances.

From a mathematical point of view, we only define equivalence as a relation over valid documents/instances, not all instances. This avoids the problem of deciding what to do with equivalence for invalid instances.

By analogy consider a typed programming language. An expression 2 + "foo" is not well-typed; technically one could consider a notion of equivalence on such expressions, so that for example, 2 + "foo" would be equivalent to (1 + 1) + "foo". But these ill-typed expressions are (by the definition of the language) not allowed. Similarly, for applications that care about validity, invalid PROV documents can be ignored, so (to us) there seems to be no negative consequence to defining equivalence to hold only on this subset of documents, or to defining all invalid documents to be equivalent (as would follow from the logical definition of equivalence).

However, for other applications, such as information retrieval, it is not safe to assume that an invalid instance is equivalent to "false"; we can imagine scenarios where an application wants to search for documents similar to an existing (possibly invalid) document. If the definition of equivalence considers all invalid documents equivalent, then there will be a huge number of matches that have no (intuitive) similarity to the query document.

We also plan to augment PROV-SEM with a logical formalization that will be related to both the model theory proposed there and the procedural specification in PROV-CONSTRAINTS. For this formalization, logical equivalence will be the same as PROV-equivalence on valid instances. (For invalid instances, logical equivalence requires making all invalid instances equivalent, which we prefer not to require.)

References:

Changes to the document:

explicitly defined isomorphism in normative section 6.1

specify that equivalence is an equivalence relation on *all* documents

specify that no invalid document is equivalent to a valid one

specify equivalence between valid documents as already done

leave it up to implementations how (if at all) to test equivalence on different invalid documents.

relating PROV-equivalence with logical equivalence is deferred to PROV-SEM

ISSUE-581 (avoid-specifying-algorithm)

Summary: Suggestion to avoid wording that 'almost requires' using normalization to implement constraints

Group response:

Just saying that we *define* validity and equivalence in terms of a normalization procedure that *can* be used to check these properties does not require that all implementations explicitly perform normalization. We discussed this issue extensively, and one consequence of this is that the implementation criteria for the constraints document will only test the extensional behavior of validity/equivalence checks; implementations only need to classify documents as valid/invalid/equivalent etc. in the same way as the reference implementation, they do not have to "be" the reference implementation.

However, this issue arose relatively late in the process and we did miss some places where the document gives a misleading impression that normalization is required to implement the spec.

Nevertheless, as written, it is difficult to see how else one could implement the specification. In fact, you are correct that there is a simple, declarative specification via a FOL theory of what the normalization algorithm does, which could be used as a starting point for people with a formal background or those who wish to implement the specification in some other way. However, we disagree that it would improve the specification to adopt the declarative view as normative.

Making the document smaller and simpler in this way would detract from its usefulness to implementors that are not already experts in computational logic. In other words, we recognize that some implementors may want to check the constraints in other ways, but we believe that the algorithm we used to specify validity and equivalence is a particular, good way by default, because it sits within a well-understood formalism known from database constraints and finite first-order model theory.

The normal forms are essentially "universal instances" in the sense of Fagin et al. 2005, and the algorithm we outline is easily seen to be in polynomial time; in contrast, simply giving a FOL theory on its own gives no guarantee of efficiency or even decidability.

We intend to incorporate this theory and formalize the link between the procedural and declarative specifications in PROV-SEM. Although PROV-SEM will not be normative, any implementation that correctly implements the declarative specification given there will be correct.

We will also take greater care to explain that the procedural approach to specification is just one of many possible ways to implement constraint checking (though the group as a whole feels that it is a good default approach for implementors seeking a shortest path to compliance).

References:

Revise all parts of the document that may currently convey the impression that the normalization algorithm is a REQUIRED implementation strategy, to ensure that it is clear that this is one approach (among possibly many) that implementations MAY employ. PROV-SEM will present a declarative specification that may serve as a better starting point for alternative implementations.

Added a paragraph to the beginning of section 2 that specifically addresses this

PROV Primer

ISSUE-561 (Primer Section 2 figure)

Since (and partly prompted by) the reviewer comment, the Working Group has discussed the best form for the primer overview diagram.

It was decided to change so that the overview image used by primer is no longer to be a copy of the one from the PROV-DM. This is because the intention is different: the primer aims to give just a very few concepts and relations to give an intuition ahead of the rest of the introduction.

The figure has been changed to be a reduced version of the one used in the PROV-O specification, and no link between the diagrams in specs is now claimed.

In ISSUE-562 and ISSUE-563, the comment is that the primer text implies particular things which the reviewer believes to be untrue, but are actually correct implications.

First, it is correct that specialization implies that the child entity inherits all of the attributes of the parent entity. It is the reviewer's counter-example that is an incorrect use of PROV: the "parent" entity of one version of a document is not the prior version of the document, but the document in general, i.e. independent of version. All versions of a document share the attributes of the document in general.

Second, the fact that two specializations of a single general entity are alternates of each other is a common case that fits the PROV definition of "alternate", and the implication is again correct.

The fact that the reviewer believed the implications to be incorrect suggests that the primer did not adequately explain the concepts.

ISSUE-564 relates to the reviewer finding the listed possible uses of the alternate relation confusingly distinct. Again, this is probably due to an inadequate explanation of the alternate and specialization relations.

The conclusion of the group is that the previous explanation of the concepts was not adequately clear.

The intuitive introduction to specialization and alternate relations, Section 2.9, has been completely rewritten based around a few use cases each with more detail than present before. Specialization is introduced before alternate, as it more clearly gives the overall motivation for the relations. We believe this gives a clearer indication of what the relations mean, and in what cases they should be used.