Open Annotation Data Model Module: Publishing

Community Draft, 08 February 2013

Table of Contents

Module: Publishing

Although the Open Annotation data model does not specify how Annotations should be transferred between systems at a network protocol level,
there are some issues regarding publication in general that it must deal with for interoperability. These include how to
embed resources within an Annotation rather than referencing them by external URIs, expressing equivalence between resources to assist with deduplication between multiple systems, and how to express an Annotation using a Named Graph structure.

Serialization

The serialization of the Annotation MAY be in any format capable of
expressing the RDF graph. It MAY be embedded within other resources, such as using RDFa to embed the Annotation within a web page.

If the Annotation has an HTTP URI, then when that URI is dereferenced, then a representation of the Annotation MUST be returned in an appropriate graph serialization format. When
the serialization is embedded within other resources, such as when expressed in RDFa, this HTTP URI MUST
continue to be expressed in the serialization. If the Annotation is not available from any dereferenceable URI, but
only embedded within a containing resource, then it MUST have a globally unique URN identifer such as a UUID or tag URI.

The RECOMMENDED serialization format is JSON-LD. This is to enable web-browser
based implementations to easily consume Annotations using tools and methods familiar to developers.
The Context presented below is RECOMMENDED to ensure consistency between implementations, and can be
referenced as http://www.w3.org/ns/oa-context-20130208.json.
It is RECOMMENDED to support content negotiation for other serialization formats, including especially RDF/XML and Turtle.

Model

The following is a Context description that is RECOMMENDED for use in systems that implement the Open Annotation data model.

Embedding Resources

The Open Annotation Core describes how to embed textual bodies within an Annotation,
however it is frequently useful to also embed Selectors, Styles and potentially even Targets to ensure that the representation is available.

The web architecture assumes that every resource has a URI, and
furthermore the expectation is that they are available for retrieval from HTTP URIs.
Some clients, however, may not be able to generate dereferenceable URIs
on their own for all of the resources that are created as part of the
annotation process. This includes the Body, any Specifiers, Styles or other
user generated information, but also potentially the Target if it is
not available online.

It is important to have a model that deals gracefully and consistently
with both online, dereferenceable resources, and embedded resources.
In both cases, the content is expressed as a resource, rather than
using only a string literal, as motivated in the section on embedded textual bodies. Both cases must also deal
with any content type, including binary data, and deal with any class of resource
within the Open Annotation model: Body, Target, Style, Selector, or State.

The Open Annotation model uses the Representing Content in RDF
specification to include the representation of such resources directly within the
Annotation graph. The resource SHOULD be assigned a non-resolvable URN, and
an appropriate class from the Content in RDF ontology, such as
cnt:ContentAsText or cnt:ContentAsBase64. If identity of the resource is not considered
to be important, then an RDF blank node MAY be used instead of the URN.

For information about embedding serializations of RDF graphs within the Annotation, please see
Embedding RDF Graphs.

Model

Vocabulary Item

Type

Description

cnt:ContentAsText

Class

The representation of a resource, expressed as plain text.

cnt:ContentAsBase64

Class

The representation of a resource, expressed as Base 64 encoded text.

cnt:chars

Property

The property of a ContextAsText that contains the representation.There MUST be exactly 1 cnt:chars property for a ContentAsText resource.

cnt:bytes

Property

The property of a ContentAsBase64 that contains the representation.There MUST be exactly 1 cnt:bytes property for a ContentAsBase64 resource.

cnt:characterEncoding

Property

The character encoding of the content string in either cnt:chars or cnt:bytes.There SHOULD be exactly 1 cnt:characterEncoding for a ContentAsText or ContentAsBase64 resource.

dc:format

Property

The media type of the representation.
There SHOULD be exactly 1 dc:format per embedded resource.

Usage

Embedding RDF Graphs

One particular case of embedding resources within an Annotation is embedding statements expressed as RDF graphs.
The triples that make up these resources MUST NOT be simply put into the Annotation graph, as the triples must
remain distinguishable as to authorship and provenance. If it were done otherwise, the metadata and identifier of the
Body or Target graph would be lost and subsumed in the Annotation's graph.

The simplest method is to publish the graph as any other resource with a dereferenceable HTTP URI,
and refer to this URI in the Annotation. This method is RECOMMENDED.

If a single document is required and thus the graph must be embedded within the Annotation, then there are two possibilities:

A serialization of the graph MAY be embedded using the Content in RDF specification, as described in the previous section. The resource MUST have appropriate metadata to allow consuming applications to determine that the representation is a graph, such as the media type provided in dc:format and the class given of trig:Graph

Alternatively, if the Annotation is requested in such a way as to be clear that the client can process a named graph, then the Annotation MAY be expressed using a named graph serialization format, such as Trig or Trix. The request might use HTTP Content Negotiation, or the server may provide a different URI for the Trig/Trix serialization. This restriction is to ensure interoperability with clients that can not parse the serialization, and hence would be unable to process the Annotation at all. This method is presented below in Figure 5.3.

Equivalence of Resources

Although it is not a challenge unique to Annotations, deduplicating resources that have been syndicated
between systems is greatly reduced by expressing the equivalence between multiple copies, or very close derivatives.
A system that could not discover duplicate Annotations would naïvely present all of them, resulting
in a very poor experience. Systems that generate statistics, reputation models, spam filtering for annotations
and similar would also have very poor results without this capability. Given these requirements, the Open Annotation model
includes a relationship to assert that while two resources are not absolutely identical, they are equivalent and hence
should not be both maintained and processed separately.

If a system retrieves an Annotation and republishes it at a different HTTP
URI, then it SHOULD express the oa:equivalentTo relationship between the
original Annotation and the republished one. The system then SHOULD
update the oa:serializedAt and oa:serializedBy properties,
as the graph has changed by adding the oa:equivalentTo relationship.

Embedded resources SHOULD be treated in the same way when republished with their own HTTP URIs. If a system publishes an embedded resource at a new HTTP URI, then it SHOULD express the oa:equivalentTo relationship between the resource's URN and the new URI from which it is available. If the embedded resource is conveyed as a blank node, then the Skolemization technique described in RDF Concepts 1.1 SHOULD be used. The system MAY also remove the embedded resource from the graph and reference only the dereferenceable URI, at its discretion. When this occurs it is possible for the Annotation's graph to change significantly from the initial version to its republished state, while still remaining equivalent.

Model

Vocabulary Item

Type

Description

oa:equivalentTo

Relationship

[subProperty of prov:alternateOf] The subject and
object resources of the oa:equivalentTo relationship represent the same
resource, but potentially have different metadata such as oa:serializedBy,
oa:serializedAt and serialization format. oa:equivalentTo is a symmetrical and transitive relationship;
if A oa:equivalentTo B, then it is also true that B oa:equivalent A; and that if B oa:equivalentTo C, then it is also true that A oa:equivalentTo C.
The Annotation MAY include 0 or
more instances of the oa:equivalentTo relationship between copies of the
Annotation or other resources, and SHOULD include as many as are available.