This
document is the specification for a vocabulary to represent Content in RDF. This vocabulary is intended to
provide a flexible framework within different usage scenarios to semantically represent
any type of content, be it on the Web or in local storage media. The document contains
as well introductory information on its usage and some examples.

This section describes the status
of this document at the time of its publication. Other documents may supersede this
document. A list of current W3C
publications and the latest revision of this technical report can be found in the W3C
technical reports index at http://www.w3.org/TR/.

This is a W3C Working Draft for Representing Content
in RDF vocabulary. Publication as a Working Draft does not imply endorsement by the
W3C Membership. This is a draft
document and may be updated, replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in progress.

Appendices

This document is the specification for a vocabulary to represent
Content in RDF. There is a
wide variety of scenarios (see section below) where a representation of any type of
content, either on the Web or in any local storage media, is necessary. This
specification provides an RDF
application that allows to present semantically such content. The vocabulary is built in
a flexible manner, thus there are no limitations known at the time of writing this
specification. It also provides opportunities for extensions to match particular needs
of its users.

Although the concepts of the Semantic Web are simple, their abstraction with
RDF is known to bring difficulties to beginners. It is recommended to
read carefully the aforementioned references and other tutorials found on the Web. It
must be also borne in mind that RDF is primarily targeted to be
machine processable, and therefore, some of its expressions are not very intuitive for
developers used to work with XML only. The examples will be
serialized using the abbreviated RDF/XML notation.

The keywords
must, required, recommended,
should, may, and optional are used in
accordance with [RFC2119].

Table 1
presents the namespaces typically used by this vocabulary (rdfs and
owl namespaces are normally only used in the schema). The core namespace
has the URI http://www.w3.org/2007/content# and the prefix
cnt. The prefix notation presents the typical conventions used in the Web
and in this document to denote a given namespace, and can be freely modified.

As stated
earlier, this framework is designed in an open way to facilitate different
implementation scenarios. The origin of the application comes from vocabularies
describing testing scenarios like EARL[EARL]. Typical applications could be:

Applications dealing with retrieval, editing and storage of content. For
example, an archiving application could store in a database annotated media content
that includes a serialization of the media files with this vocabulary.

Applications dealing with the exchange of text documents and other types of
media, like Web Services. For example, an AJAX application could
exchange document fragments and images with a Web server to react to different user
actions.

Applications dealing with the testing and/or repair of content. For example, an
accessibility testing tool could store together with the results of a compliance
test, the tested Web resources to ensure that the correct version of the tested
subject is available to the developers.

This list is not exclusive and several other scenarios could be developed.

This section presents a description of the
classes and properties of this RDF vocabulary. We present every class
together with its properties and subclasses. We also include whenever relevant short
snippets and examples.

The Content class is an overarching class
with no properties. It is recommended always to use one of its subclasses. A resource of
type Content represents any content that could be found on the
Web, in an Intranet or in local storage media, for example. There is no restriction
within the vocabulary scope on what can be represented with this class: textual content,
binary files (e.g., images or movies), XML files, etc.

The Base64Content class is a subclass of the Content class. A resource of type
Base64Content represents Base64 encoded binary content (as
defined by [RFC2045]) and can be used for any type of
content, although its more typical use case is for binary files.

The following
property may appear in resources of type Base64Content (with a maximum cardinality of 1):

characterEncoding

If the byte sequence was created from a given character sequence this property
can be used to store the character encoding that was applied to create the byte
sequence.

The following property must appear in resources of type
Base64Content:

bytes

Character string representing the Base64 encoded byte sequence of the given
content.

Example 2.1: This example displays the
representation of the W3C logo as a Base64Content resource. (Note: due to its length, the encoded string
has been chunked until {...}.)

The XMLContent class is a
subclass of the Content class. A
resource of type XMLContent represents XML
content.

The following properties may appear in resources of
type XMLContent (with a maximum cardinality of 1):

characterEncoding

If the parser's input character stream was created from a given byte stream this
property can be used to store the character encoding that was applied to create the
character stream. Note: This is the used character
encoding, not the one declared in an XML declaration.

xmlDecl

Property pointing to an XMLDecl resource
representing the XML declaration.

xmlLeadingMisc

Property representing as an XML Literal the part of the
XML (comments and processing instructions) following the
XML declaration and preceding the document type declaration if
there is one.

A resource of type XMLDecl represents an
XML declaration. This class is normally used in conjunction with the
XMLContent class, when the
corresponding XML file has an XML declaration. The
resources are linked via the xmlDecl property.

The following
properties may appear in resources of type XMLDecl:

xmlEncoding

Property representing the character encoding specified in the
XML declaration.

xmlStandalone

Property representing the standalone document declaration.

The following property must appear in resources of type
XMLDecl:

xmlVersion

Property representing the XML version specified in the
XML declaration.

A resource of type DoctypeDecl
represents a document type declaration. Likewise XMLDecl, this class is normally used in conjunction with the XMLContent class, when the
corresponding XML file has a document type declaration. The resources
are linked via the doctypeDecl property.

or likewise as XMLContent. The information from the XML declaration is modelled
as an XMLDecl resource and refered to from the XMLContent resource by the xmlDecl property. As the
comment <!-- this is a comment --> precedes the document type
declaration a xmlLeadingMisc property is created with its object literal
containing the comment. The document type declaration is modelled as a DoctypeDecl resource and refered to from the XMLContent resource by the doctypeDecl property.

We have identified some situations to
make clear when to create which type of content resources. The following are only
recommendations and are non-normative:

Situation A: Given the byte sequence of non-text content (byteSeq)
read from a file system. A cnt:Base64Content resource may
be created with a cnt:bytes property with an object literal created from
byteSeq. But no cnt:TextContent resource must
be created, although in some cases it is technically possible to create a character
sequence from the byte sequence byteSeq using some character
encoding.

Situation B: Given the byte sequence of text content (byteSeq)
received from a Web server and an appropriate character encoding (ce). A
cnt:Base64Content resource may be created with a
cnt:bytes property with an object literal created from
byteSeq. After transforming byteSeq to character sequence
charSeq using character encoding ce, a cnt:TextContent resource may be created with
cnt:chars property with an object literal charSeq and
cnt:characterEncoding property with an object literal ce.

Situation C: Given the byte sequence of text content (byteSeq)
received from a Web server and an inappropriate character encoding (ce).
A cnt:Base64Content resource may be created with a
cnt:bytes property with an object literal created from
byteSeq. Because transforming byteSeq to a character sequence
charSeq using character encoding ce fails, no cnt:TextContent resource can be created.

Situation D: Given the character sequence of text content (charSeq)
created in memory and an appropriate character encoding (ce). A cnt:TextContent resource may be created with a
cnt:chars property with an object literal created from
charSeq. After transforming charSeq to byte sequence
byteSeq using character encoding ce, a cnt:Base64Content resource may be created with
cnt:bytes property with an object literal byteSeq and
cnt:characterEncoding property with an object literal ce.

Situation E: Given the byte sequence of wellformed XML
content (byteSeq) received from a Web server and an appropriate character
encoding (ce). cnt:Base64Content and cnt:TextContent resources may be created as in situation B.
Additionally, an cnt:XMLContent resource may be
created.

Situation F: Given a DOM
Document in memory, originally created by parsing some XML
source, but afterwards changed by DOM operations. A cnt:XMLDecl resource may be created from the information in the
Document node itself, and a cnt:DoctypeDecl resource from
the information in the DocumentType node. A cnt:XMLContent
resource may be created after serializing the relevant child nodes of the Document
node (1. Comment and ProcessingInstruction nodes preceding a DocumentType node, and
2. nodes following a DocumentType node) to create object literals for
cnt:xmlLeadingMisc and cnt:xmlRest. See the Mapping between DOM and the Content-in-RDF vocabulary.

The vocabulary provides a framework that allows the representation
of any type of content. Of course, there are many possibilities for extensions that will
allow the inclusion of additional metadata, like, e.g., that included in some multimedia
formats. Typical scenarios for extensions could be:

Classes to specify the metadata of multimedia content like audio or image
files.

Properties to ensure the integrity of data by providing some kind of checksum
algorithm.

However, at the point of writing this specification, the Working Group has
decided to provide the basic framework that will support the immediate needs of
vocabularies using this specification like EARL[EARL], leaving the room open for further extensions as new use
cases are presented to us.