A summary of the threads 'A triple is not unique' and
'Statements/Reified statements' from the RDF Interest mailing list,
November 2000 [1]

Abstract

This paper sets out to summarise two lengthy threads from the RDF Interest
group mailing
list in November 2000. The argument centred around
some different approaches to modelling RDF statements in RDF (and the
corresponding confusion as to what exactly an RDF statement amounts to). It was
recognised that when statements are reified,
a distinction can be made between the statement itself (represented as a
subject, predicate and object) and the stating of the statement, for example
who stated it and when. Difficulties arise because of a perception that the
identity conditions for a statement (triple) preclude the representation of
statings in the RDF model. This is also a topic on the RDF Interest group
issues list [2]. Note also that Sefan Decker's summary of a
previous discussion on the RDF Interest group list [3]
covers similar ground.

Below I provide a brief expression of the problem and three solutions and
their variants gathered from these threads in the RDF Interest group mailing
list. I have not examined the entire interest group archives, so this is
not intended to be a complete expression of the problem, but it is hoped
that it will be useful as a summary of the recent discussions.

The discussion also ranged into implementation issues, problems of inference
and complex querying, and a discussion of contexts in the sense of Graham
Klyne [4]. I briefly set out some issues relating to these
below.

Status of this document

This is a draft document, and a summary of the statements made on
the RDF Interest list. It should not be taken as a definitive account of the
views of anyone on that list. It has not been agreed on or read over by the
members of that list.

'statement'

A member of the set Statements consisting of the 3-tuple (triple) subject,
predicate, object ([5] M&S section 5)

'reified statement'

An object of type rdf:Statement and also a resource representing
the reification of a triple ([5] M&S section 5). A resource
of type rdf:Statement must have exactly one each of rdf:subject, rdf:predicate
and rdf:object properties.

'stating'

It is useful to make the distinction between a statement (a triple) and the
instance of someone having stated it, generating a stating.
See main discussion below.

'model'

'the rdf model' is the data model behind rdf: it is a way of representing
RDF in a syntax neural way and is used for evaluating equivalence between
RDF expressions.
'a model' is usually a way of grouping statements together in
RDF; sometimes referring to the XML document statements came from.
In addition, people may speak of a reified statement as 'modelling' a
statement. Modelling is used in this way to mean that the reified statement
represents a statement - see
Graham Kline's post for example.

'context' ('space')

A term currently used in the sense of Klyne's paper [4]. It
is a way of
grouping statements together, and is modelled as a bag of reified
statements. It is also used in logic e.g. R.V. Guha's PhD thesis
[6].

Now suppose that the identity conditions for a reified statement are that the
properties and values rdf:subject, rdf:predicate, rdf:object are identical in
each case. This would imply that the expression that

So this means that we are losing the information that there was one stating by
Ralph Swick on on 1999-02-22 and a different stating by Jonas Liljegren on
2000-11-20. We can think of two separate parts to the statement - the abstract
statement

and the statings of it by Ralph Swick and Jonas Liljegren. In the example
above, the abstract statement and the actual statings are getting mixed up.

Why would we assume that the identity conditions for a reified statement were
that the properties and values rdf:subject, rdf:predicate, rdf:object were
identical?

One answer is the following:

All statements are members of the set statements

In mathematics, sets cannot contain duplicates

Statements are defined by their subject, predicate and object only

Therefore, there cannot be duplicate statements in the set of statements with the
same subject, predicate and object.

Therefore since a reified statement represents a statement, there must be
a
one-to-one correspondance between reified statements and statements.

Because there currently is no way to mandate the conditions under which anonymous
nodes are identical (although schemas could be written which do this - see
?? Shyam
Sarkar
) this means
that the anonymous node (the reified statement) must be given a URI. This means
that there is a problem with who defines what the URI of a unique statement is
(see for example post by
Dan Brickley).
A possible solution was suggested by
Sergey Melnik.
In turn, this difficulty could be resolved if it were possible to state that two
URIs were equivalent, which would mean taking a view on whether a URI represents a
reource or an entity (see for example
Brian McBride's post).

Another answer is about implementation:

If we are attempting to optimise storage of RDF, and we are ignoring the model
that the
data came from, we could store the first reified
statement as a quadruple: subject, predicate, object, generated identifier. Then
when we get to the second reified statement, we see that we already have this
statement triple in the database. If we do not generate another identifier for it
but instead reuse the existing identifier and triple to hang other properties off,
then we will lose some information
(see, for example
Jonas Liljegren's post
).
Note that if we also store a model identifier
for each statement (for example where the data came from), this problem could
still occur, because within a given model it would be possible for two different
people to state the same thing.

Again, this is a question about the underlying model, brought to focus by the
ineffiency of storing reified statements as triples. The distinction, as before,
is between regarding the subject, predicate and object of the reified statement as
refering to some abstract statement, which is unique within the set of
statements, and regarding the reified statement as refering to a stating, unique
of itself.

On this view, although a reified statement represents a statement, it is only
one possible representation of it. There is therefore not necessarily a one-to-one
correspondence between a statement and its reification
(Graham
Klyne).
Statements may or may not be unique (there seems to be a preference for
uniqueness, in some sense, maybe within a context (space) or model).
Introduction of spaces: Jonathan
Borden.

This means that reified statements should be regarded as unique of themselves and
as statings. Each stating is unique. If someone else makes a reified statement
with the same subject,
predicate and object properties, then we cannot regard that stating as being
the same as the first. The loss of information that occurs in Liljegren's
example would not occur.

However, when a reified statement is given a URI via the ID attribute then this
implies that any reified statement with that URI is referring to the same stating.

problems

loss of some information, because the distinction between the
statement and the stating is being lost, so that we are removing the
possibility of aggregation of this part of the data.

implementors often seem to use generated unique ids for triples to preserve
contextual information outside the model, which means that in implementations,
statements with identical subject, predicate and object are stored separately,
whether they are reified or not (Jonas Liljegren).

Sergey Melnik

Make statements (triples) resources, and make statements unique within the set
of statements, defined by their subject, predicate and object and values.
Generate a unique ID using a Skolem function for each triple, and hang
contextual and reification information off this triple, external to the model
(
Sergey Melnik).

A model implementation of this might be to allow arcs to terminate on arcs
(
Seth Russell
).

The reification mechanism is syntactic.
The information is preserved.
The identifier for the triple is the same whoever calculates it, so that
aggregation can occur.

Brian McBride

The model is incorrect because it does not distinguish between statements and
statings. The statement is abstract (
Brian McBride)
and statements are uniquely defined within the set of statements by their subject,
predicate and object values (
Brian McBride).

Giving the reified statement a URI using ID causes a great deal of
controversy, because of the question of who first names the statement. But
maybe one could mandate in a schema that subject, predicate and object
properties uniquely define an anonymous node of type rdf:Statement, and then
hang stating events off it (this would require a schema with cardinality
constraints):

There was a discussion about how to implement reification and
contextualization. Jonas Liljegren uses a model identifier to represent
whether (or by whom) the statement was reified.

He has a practical solution to the problem of handling the various
cases of: different URIs for the same statement, same URI for different
statements and the same URI for the same statement: creating a unique key
from the model and the URI of the statement. He argued that the resulting
statement does have a representation in the model, as a reified statement in a
model container.

At least sometimes we really want to be able to query as if statements were
not reified, while retaining contextual information. This is where Sergey's
proposal would be very useful.

Another difficulty is with inferencing: we would like to be able to reason
within a context or model, so that we can regard the statements from within
the model as 'facts'. This is possible, although cumbersome if we regard
contexts as bags, and model them within the RDF model. Again, if we would hang
contextual information off statements in a way that is hidden from the RDF
model, this would be useful here.

Conclusions are for the RDF Interest group to decide; however I have a couple of
points to make as someone who has attempted to implement RDF storage.

The first is that we need to remember that we can distinguish between the model
a serialization of a model, the triples produced by a parser from a serialization,
and the storage of triples.
Personally I have tended to store triples in a very naive way, simply as the
triples that I get from the parser. I have come to the conclusion that for my
purposes, which include storing data from diverse sources and querying it, that I
need to think about storing meta information about reification and contexts and
models in a non-naive fashion, so that I can query the triples as if they were
statements, while still retaining the contextual information about reification and
so on, perhaps using something like Sergey's suggestion.

However, this means that I need to have examples of good practice (or preferably a
decision made) about the uniqueness conditions for statements and reifications,
and about how to serialize contextual information.