Abstract

The current Web is primarily made up of an enormous number of documents
that have been created using HTML. These documents contain significant
amounts of structured data, which is largely unavailable to tools and
applications. When publishers can express this data
more completely, and when tools can read it, a new world of user
functionality becomes available, letting users transfer structured
data between applications and web sites, and allowing browsing applications
to improve the user experience: an event on a web page can
be directly imported into a user's desktop calendar; a license on a
document can be detected so that users can be informed of their rights
automatically; a photo's creator, camera setting information,
resolution, location and topic can be published as easily as the original
photo itself, enabling structured search and sharing.

RDFa Core is a specification for attributes to express structured data
in any markup language.
The embedded
data already available in the markup language (e.g., XHTML)
is reused by the RDFa markup, so that
publishers don't need to repeat significant data in the document
content.
The underlying abstract
representation is RDF [RDF-PRIMER], which lets publishers build their own
vocabulary, extend others, and evolve their vocabulary with maximal
interoperability over time. The expressed structure is closely tied
to the data, so that rendered data can be copied and pasted along
with its relevant structure.

The rules for interpreting the data are generic, so that there is no
need for different rules for different formats; this allows authors
and publishers of data to define their own formats without having to
update software, register formats via a central authority, or worry
that two formats may interfere with each other.

RDFa shares some of the same goals with microformats [MICROFORMATS].
Whereas microformats
specify both a syntax for embedding structured data into HTML
documents and a vocabulary of specific terms for each microformat,
RDFa specifies only a syntax and relies on independent specification
of terms (often called vocabularies or taxonomies) by others. RDFa allows terms
from multiple independently-developed vocabularies to be freely
intermixed and is designed such that the language can be parsed
without knowledge of the specific vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed
at:

those looking to create an RDFa Processor, and who therefore need a
detailed description of the parsing rules;

those looking to recommend the use of RDFa within their
organization, and who would like to create some guidelines for their
users;

anyone familiar with RDF, and who wants to understand more about what
is happening 'under the hood', when an RDFa Processor runs.

For those looking for an introduction to the use of RDFa and some real-world
examples, please consult the RDFa Primer.

How to Read this Document

First, if you are not familiar with either RDFa or RDF, and simply
want to add RDFa to your documents, then you may find the RDFa Primer
[RDFA-PRIMER] to be a better introduction.

If you are already familiar with RDFa, and you want to examine the processing
rules — perhaps to create an RDFa Processor — then you'll find the Processing Model
section of most interest.
It contains an overview of each of the processing steps, followed by more detailed
sections, one for each rule.

If you are not familiar with RDFa, but you are familiar with RDF,
then you might find reading the Syntax Overview useful,
before looking at the Processing Model
since it gives a range of examples
of markup that use RDFa. Seeing some examples first should make reading the
processing rules easier.

If you are not familiar with RDF, then you might want to take a look at
the section on RDF Terminology
before trying to do too much with RDFa. Although RDFa is designed to
be easy to author — and authors don't need to understand RDF to use it —
anyone writing
applications that consume RDFa will need to understand RDF. There is a lot
of material about RDF on the web, and a growing range of tools that support RDFa,
this document only contains
enough background on RDF to make the goals of RDFa more clear.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is an early editors draft for discussion purposes.
Once development is complete, if accepted by the W3C
membership, this document will supersede the previous
Recommendation. There are a number of substantive differences between
this version and its predecessor, including:

The removal of the specific rules for XHTML - these are now defined in XHTML+RDFa
[XHTML-RDFA]

An expansion of the datatypes of some RDFa attributes so that they
can contain Terms, CURIES, or URIs.

The ability to change the default vocabulary when no 'prefix' is specified on a
CURIE.

The ability to reference external RDFa Profile documents; these are used to ease authoring by creating vocabulary term collections.

A sample test harness is available. This set of tests is
not intended to be exhaustive. Users may find the tests to
be useful examples of RDFa usage.

Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1. Motivation

This section is non-normative.

RDF/XML [RDF-SYNTAX] provides
sufficient flexibility to represent all of the abstract concepts in
RDF [RDF-CONCEPTS]. However, it
presents a number of challenges; first it is difficult or impossible to
validate documents that contain RDF/XML using XML Schemas or DTDs,
which therefore makes it difficult to import RDF/XML into other markup
languages. Whilst newer schema languages such as RELAX NG [RELAXNG-SCHEMA]
do provide a way to validate documents
that contain arbitrary RDF/XML, it will be a while before they gain
wide support.

Second, even if one could add RDF/XML directly into an XML
dialect like XHTML, there would be significant data duplication
between the rendered data and the RDF/XML structured data. It would
be far better to add RDF to a document without repeating the
document's existing data. For example, an XHTML document that
explicitly renders its author's name in the text—perhaps as a
byline on a news site—should not need
to repeat this name for the RDF expression of the same concept: it
should be possible to supplement the existing markup in such a way
that it can also be interpreted as RDF.

Another reason for aligning the rendered data with the structured data
is that it is highly beneficial to express the web data's
structure 'in context'; as users often want to transfer structured data from one
application to another, sometimes to or from a non-web-based
application, the user experience can be enhanced.
For example, information about specific
rendered data could be presented to the user via 'right-clicks' on an
item of interest.

In the past, many attributes were 'hard-wired' directly into
the markup language to represent specific concepts. For example, in
XHTML 1.1 [XHTML11] and HTML [HTML401]
there is @cite; the
attribute allows an author to add information to a document which
is used to indicate the origin of a quote.

However, these 'hard-wired' attributes make it difficult to
define a generic process for extracting metadata from any document
since an RDFa Processor would need to know about each of the special attributes.
One motivation for RDFa has been to devise a means by which documents
can be augmented with metadata in a general, rather than hard-wired, manner.
This has been achieved by creating a fixed set of attributes and parsing rules,
but allowing those attributes to contain properties from any of a number of the
growing range of available RDF vocabularies. In most cases the values of those properties
are the information that is already in an author's XHTML document.

RDFa alleviates the
pressure on markup language designers to anticipate all the structural
requirements users of their language might have, by
outlining a new syntax for RDF that
relies only on attributes. By adhering to the concepts and rules in this specification,
language designers can import RDFa into their environment with a minimum of hassle and
be confident that semantic data will be extractable from their documents
by conforming processors.

2. Syntax Overview

This section is non-normative.

The following examples are intended to help readers who are not familiar with RDFa to
quickly get a sense of how it works.
For a more thorough introduction, please read the RDFa Primer [RDFA-PRIMER].

For brevity, in the following examples and throughout this document, assume that the following
vocabulary prefixes have been defined:

biblio:

http://example.org/biblio/0.1

cc:

http://creativecommons.org/ns#

dbp:

http://dbpedia.org/property/

dbr:

http://dbpedia.org/resource/

dc:

http://purl.org/dc/terms/

ex:

http://example.org/

foaf:

http://xmlns.com/foaf/0.1/

rdf:

http://www.w3.org/1999/02/22-rdf-syntax-ns#

rdfa:

http://www.w3.org/ns/rdfa#

rdfs:

http://www.w3.org/2000/01/rdf-schema#

taxo:

http://purl.org/rss/1.0/modules/taxonomy/

xhv:

http://www.w3.org/1999/xhtml/vocab#

xsd:

http://www.w3.org/2001/XMLSchema#

2.1 The RDFa Attributes

RDFa makes use of a number of commonly found attributes, as well as providing a few new ones. Attributes
that already exist in widely deployed languages (e.g., HTML) have the same meaning they
always did, although their syntax has been slightly modified
in some cases.
For example, in (X)HTML, @rel already defines the relationship between one document and another. However,
in (X)HTML there is no clear way to add new values; RDFa sets out to explicitly solve this problem, and does so by
allowing URIs as
values. It also introduces the idea of 'compact URIs' — referred to as CURIEs in this
document — which allow a full URI value to be
expressed succinctly. For a complete list of RDFa attribute
names and syntax, see Attributes and
Syntax.

2.2 Examples

As an (X)HTML author you will already be familiar with using meta and
link to add additional information to your documents:

RDFa also permits external definition of collections of prefixes.
The following RDFa Profile document, residing at http://www.example.org/vocab-rdf-dc.html defines
the standard RDF prefixes as well as the Dublin Core vocabulary prefix in RDFa.

<html xmlns="http://www.w3.org/1999/xhtml"
prefix="rdfa: http://www.w3.org/ns/rdfa#">
<head>
...
</head>
<body>
<p>This is an example to defining the standard RDF and
Dublin Core prefixes
</p>
<p typeof="">
The "<span property="rdfa:prefix">rdf</span>" prefix can
be used for the URI:
"<span property="rdfa:uri">http://www.w3.org/1999/02/22-rdf-syntax-ns#</span>".</p>
<p typeof="">
The "<span property="rdfa:prefix">rdfs</span>" prefix can
be used for the URI:
"<span property="rdfa:uri">http://www.w3.org/2000/01/rdf-schema#</span>".</p>
<p typeof="">
The "<span property="rdfa:prefix">dc</span>" prefix can
be used for the URI:
"<span property="rdfa:uri">http://dublincore.org/documents/dcmi-terms/</span>".</p>
</html>

<p about="http://www.example.org/doc"
profile="http://www.example.org/vocab-rdf-dc">
<span property="dc:title">title of the document</span>
<span property="rdfs:comment">and this is a longer comment
on the same document</span>
</p>

It is also possible to define terms. Given the following RDFa Profile document at http://www.example.org/vocab-foaf-terms.html:

<html xmlns="http://www.w3.org/1999/xhtml"
prefix="rdfa: http://www.w3.org/ns/rdfa#">
<head>
<title>Example RDFa Vocabulary</title>
</head>
<body>
<p>
This is an example RDFa vocabulary that makes it easier to
use the foaf:name and foaf:homepage terms.
</p>
<p typeof="">
The "<span property="rdfa:term">name</span>" term can
be used for the URI:
"<span property="rdfa:uri">http://xmlns.com/foaf/0.1/name</span>".</p>
<p typeof="">
The "<span property="rdfa:term">homepage</span>" term can
be used for the URI:
"<span property="rdfa:uri">http://xmlns.com/foaf/0.1/homepage</span>".</p>
</body>
</html>

3. RDF Terminology

This section is non-normative.

The previous section gave examples of typical markup in order to illustrate
the structure of RDFa markup.
However, what RDFa represents is RDF. In order to author RDFa you do not need to understand
RDF, although it would certainly help. However, if you are building a system that consumes the RDF
output of a language that supports RDFa you will almost certainly need to understand RDF. This section
introduces the basic concepts and terminology of RDF. For a more thorough explanation of RDF,
please refer to the RDF Concepts document [RDF-CONCEPTS]
and the RDF Syntax Document [RDF-SYNTAX].

3.1 Statements

The structured data that RDFa provides access to is a collection of statements.
A statement is a basic unit of information that has been constructed in a specific format to make it easier to
process. In turn, by breaking large sets of information down into a collection of statements, even very complex metadata
can be processed using simple rules.

To illustrate, suppose we have the following set of facts:

Albert was born on March 14, 1879, in Germany. There is a picture of him at
the web address, http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg.

This would be quite difficult for a machine to interpret, and it is certainly not in a format
that could be passed from one data application to another. However, if we convert the information to a set of statements it begins
to be more manageable. The same information could therefore be
represented by the following shorter 'statements':

Albert was born on March 14, 1879.
Albert was born in Germany.
Albert has a picture at
http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg.

3.2 Triples

To make this information machine-processable, RDF defines a structure for these statements. A statement
is formally called a triple, meaning that it is made up of three components. The first is the subject of the
triple, and is what we are making our statements about. In all of these examples the subject is 'Albert'.

The second part of a triple is the property of the subject that we want to define. In the examples here, the
properties would be 'was born on', 'was born in', and 'has a picture at'. These are more usually called predicates
in RDF.

The final part of a triple is called the object. In the examples here the three objects have
the values 'March 14, 1879', 'Germany', and 'http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg'.

3.3 URI references

Breaking complex information into manageable units helps us be specific about our data, but there is still some ambiguity. For example,
which 'Albert' are we talking about? If another system has more facts about 'Albert', how could we know
whether they are about the same person, and so add them to the list of things we know about that person?
If we wanted to find people born in Germany, how could we know that the predicate 'was born in' has the same
purpose as the predicate 'birthplace' that might exist in some other system? RDF solves this problem by replacing our
vague terms with URI references.

URIs are most commonly used to identify web pages, but RDF makes use of them as a way to provide unique identifiers
for concepts. For example, we could identify the subject of all of our statements (the first part of each triple)
by using the DBPedia [http://dbpedia.org] URI for Albert Einstein, instead of
the ambiguous string 'Albert':

<http://dbpedia.org/resource/Albert_Einstein>
has the name
Albert Einstein.
<http://dbpedia.org/resource/Albert_Einstein>
was born on
March 14, 1879.
<http://dbpedia.org/resource/Albert_Einstein>
was born in
Germany.
<http://dbpedia.org/resource/Albert_Einstein>
has a picture at
http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg.

URI references are also used to uniquely identify the objects in metadata statements (the third part of each triple). The
picture of Einstein is already a URI, but we could also use a URI to uniquely identify the country Germany. At the same time
we'll indicate that the name and date of birth really are
literals (and not URIs), by putting quotes around them:

<http://dbpedia.org/resource/Albert_Einstein>
has the name
"Albert Einstein".
<http://dbpedia.org/resource/Albert_Einstein>
was born on
"March 14, 1879".
<http://dbpedia.org/resource/Albert_Einstein>
was born in
<http://dbpedia.org/resource/Germany>.
<http://dbpedia.org/resource/Albert_Einstein>
has a picture at
<http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg>.

URI references are also used to ensure that predicates are unambiguous; now we can be sure that 'birthplace',
'place of birth', 'Lieu de naissance' and so on, all mean the
same thing:

3.4 Plain literals

Although URI resources are always used for subjects and predicates, the object part of a triple can be either a URI or a
literal. In the example
triples, Einstein's name is represented by a plain literal, which means
that it is a basic string with no type or language
information:

3.5 Typed literals

Some literals, such as dates and numbers, have very specific meanings, so RDF provides a mechanism for indicating
the type of a literal. A typed literal is indicated
by attaching a URI to the end of a plain literal, and this URI indicates the literal's datatype. This URI is usually based on
datatypes defined in the
XML Schema Datatypes specification [XMLSCHEMA-2]. The following
syntax would be used to
unambiguously express Einstein's date of birth as a literal of
type http://www.w3.org/2001/XMLSchema#date:

3.6 Turtle

RDF itself does not have one set way to express triples, since the key ideas of RDF are the triple and the use of URIs,
and not any particular syntax.
However, there are a number of mechanisms for expressing triples, such as RDF/XML [RDF-SYNTAX-GRAMMAR], Turtle [TURTLE],
and of course RDFa. Many discussions
of RDF make use of the Turtle syntax to explain their ideas, since it is quite compact. The examples
we have just seen are already using this syntax, and we'll continue to use it throughout this document when
we need to talk about the RDF that could be generated from some RDFa.
Turtle allows long URIs to be abbreviated by using a URI mapping, which can be used to express a compact URI
as follows:

When writing examples, you will often see the following URI in the Turtle representation:

<>

This indicates the 'current document', i.e., the document being processed. In reality there would
always be a full URI based on the document's location, but this abbreviation serves to make examples
more compact. Note in particular that the whole technique of abbreviation is merely a way to make
examples more compact, and the actual triples generated would always use the full URIs.

3.7 Graphs

A collection of triples is called a graph. All of the triples
that are defined by this specification are contained in the
default graph by an RDFa Processor.
For more information on graphs and other RDF concepts, see [RDF-CONCEPTS].

3.8 Compact URIs

In order to allow for the compact expression of RDF statements,
RDFa allows the contraction of most URI references into
a form called a 'compact URI', or CURIE. A detailed discussion of this
mechanism is in the section CURIE and URI Processing.

Note that CURIEs are only used in the markup and Turtle examples, and will never
appear in the generated triples, which are defined by RDF to use URI references.

Full details on how CURIEs are processed are in the section titled CURIE Processing.

3.9 Markup Fragments and RDFa

A growing use of embedded metadata is to take fragments of markup and move
them from one document to another. This may happen through the use
of tools, such as
drag-and-drop in a browser, or through snippets of code provided to
authors for inclusion
in their documents. (A good example of the latter is the licensing
fragment provided by Creative Commons.)

However, those involved in creating fragments (either by building
tools, or authoring
snippets), should be aware that this specification does not say how
fragments are processed. Specifically, the processing of a fragment
'outside' of a complete
document is undefined because RDFa processing is largely about context.
Future versions of this or related
specifications may do more to define this behavior.

Developers of tools that process fragments, or authors of fragments
for manual inclusion,
should also bear in mind what will happen to their fragment once it
is included in a complete
document. They should carefully consider the
amount of 'context'
information that will be needed in order to ensure a correct
interpretation of their fragment.

3.10 A description of RDFa in RDF terms

The following is a brief description of RDFa in terms of the RDF terminology introduced
here. It may be
useful to readers with an RDF background:

4. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

4.1 RDFa Processor Conformance

A conforming RDFa Processor must make available to a consuming application
a single RDF graph containing all possible triples generated by
using the rules in the Processing Model section.
This specification uses the term
default graph to mean all of the triples asserted by a document according to the Processing Model section.

A conforming RDFa Processor may make available additional triples that have
been generated using rules not described here, but these triples must not be made
available in the default graph. (Whether these additional triples are made
available in one or more additional RDF graphs is implementation-specific, and
therefore not defined here.)

A conforming RDFa Processor
must preserve whitespace in both
plain literals
and XML literals. However, it may be the case that
the architecture in which a processor operates does not make all whitespace available. It
is therefore advisable for authors who would like to make their documents consumable across
different processors, to remove any unnecessary whitespace in their markup.

4.2 RDFa Host Language Conformance

Host Languages that incorporate RDFa must adhere to the
following:

All of the
facilities required in this specification must be included in the Host
Language.

The attributes defined in this
specification must be included in the content model of the Host
Language.

If the Host Language uses XML Namespaces [XML-NAMES],
the attributes
in this specification should be incorporated in the namespace
of the Host Language.

If the Host Language has its own definition for any
attribute defined in this specification, that definition
must be such that the processing required by this specification
remains possible when the attribute is used in a way consistent
with the requirements herein.

5. Attributes and Syntax

This specification defines a number of attributes and the
way in which the values of those attributes are to be
interpreted when generating RDF triples. This section
defines the attributes and the syntax of their values.

A method of declaring prefix mappings as defined in [XML-NAMES].
Prefix mappings declared via this attribute are equivalent to those declared using
@prefix. If this attribute and @prefix declare
a mapping for the same prefix on the same element, the
mapping from @prefixmust take precedence. Document authors should
use @prefix, and should not mix @prefix and this attribute
on the same element.

6. CURIE Syntax Definition

The key component of RDF is the URI, but these are usually long and unwieldy. RDFa therefore supports a mechanism
by which URIs can be abbreviated, called 'compact URIs' or simply, CURIEs.

A CURIE is comprised of two components, a prefix and a
reference. The prefix is separated from the reference by a colon
(:). In general use it is possible to omit the prefix, and so create a CURIE that makes use of the
'default prefix' mapping; in RDFa the 'default prefix' mapping is http://www.w3.org/1999/xhtml/vocab#.
It's also possible to omit both the prefix and the colon, and so create a CURIE that contains
just a reference which makes use of the 'no prefix' mapping. This specification
does not define a default 'no prefix' mapping. However, Host Languages may
define a default. This mapping may be changed via @vocab.

The working group has not reached consensus on whether there should be a default prefix
mapping defined in RDFa Core, or whether it should be defined in Host Languages.

The production safe_curie is not required, even in situations
where an attribute value is permitted to be a CURIE or a URI:
A URI that uses a scheme that is not an in-scope
mapping cannot be confused with a CURIE. The concept of a
safe_curie is retained for backward compatibility.

In normal evaluation of CURIEs the following context information would need to be provided:

a set of mappings from prefixes to URIs;

a mapping to use with the default prefix (for example, :p);

a mapping to use when there is no prefix (for example, p);

a mapping to use with the '_' prefix, which is used to generate unique identifiers (for example, _:p).

In RDFa these values are defined as follows:

the set of mappings from prefixes to URIs is provided by the current in-scope prefix declarations of the
current element during parsing;

the mapping to use with the default prefix is the current default prefix mapping;

the mapping to use when there is no prefix is not defined, which effectively prohibits the use of CURIEs that do not contain a colon (however, see General Use of Terms in Attributes) ;

the mapping to use with the '_' prefix, is not explicitly stated, but since it is used to generate bnodes,
its implementation needs to be compatible with the RDF definition and rules in Referencing Blank Nodes.
A document should not define a mapping for the '_' prefix. A Conforming RDFa Processor must ignore any definition of a mapping for the '_' prefix.

A CURIE is a representation of a full URI. The rules for determining that URI are:

If a CURIE consists of an empty prefix and a reference,
the URI is obtained by taking the current default prefix mapping and concatenating
it with the reference. If there is no current default prefix
mapping, then this is not a valid CURIE and must be ignored.

Otherwise, if a CURIE consists of a non-empty prefix and reference,
and if there is an in-scope mapping for prefix, then the URI is created
by using that mapping, and concatenating it with the
reference.

Finally, if there is no in-scope mapping for
prefix, then the value is not a CURIE.

Note that the resulting URI must be a syntactically valid IRI [RFC3987]. For a more detailed explanation see
CURIE and URI Processing.
Note that while the lexical space
of a CURIE is as defined in curie above,
the value space is the set of IRIs.

7. Processing Model

This section looks at a generic set of processing rules for creating a set of triples that represent the
structured data present in an RDFa document. Processing need not follow the DOM traversal technique outlined
here, although the effect of following some other manner of processing must be the same as if the processing
outlined here were followed. The processing model is explained using the idea of DOM traversal
which makes it easier to describe (particularly in relation to the evaluation context).

Note that in this section, explanations about
the processing model or guidance to implementors are enclosed
in sections like this.

7.1 Overview

Evaluating a document for RDFa triples is carried out by starting at the document object, and then
visiting each of its child elements in turn, in document order, applying processing rules. Processing
is recursive in that for each child element the processor also visits each of its child elements,
and applies the same processing rules.

In some environments there will be little difference between starting at the root element of
the document, and starting at the document object itself. It is defined this way because in some
environments important information is present at the document object level which is not present on the
root element.

As processing continues, rules are applied which may generate triples, and may also change the
evaluation context information that will then be used when processing descendant elements.

This specification does not say anything about what should happen to the triples generated, or whether more
triples might be generated during processing than are outlined here. However, to be conforming, an
RDFa processor must act as if at a minimum the rules in this section are applied, and a single
RDF graph produced. As described in the RDFa Processor Conformance section,
any additional triples generated must not appear in the default graph.

7.2 Evaluation Context

During processing, each rule is applied using information provided by an evaluation context.
An initial context is created when processing begins. That context has the following members:

The base. This will usually be the URL of the document being processed, but it could be some other URL,
set by some other mechanism, such as the (X)HTML base element. The important thing is that it
establishes a URL against which relative paths can be resolved.

The parent subject. The initial value will be the same as the initial value of base, but
it will usually change during the course of processing.

The parent object. In some situations the object of a statement becomes the subject of any
nested statements, and this property is used to convey this value. Note that this value may be a bnode, since in some
situations a number of nested statements are grouped
together on one bnode.
This means that the bnode must be set in the containing statement and passed down, and this property is
used to convey this value.

A list of current, in-scope URI mappings.

A list of incomplete triples. A triple can be incomplete when no object resource is provided alongside
a predicate that requires a resource (i.e., @rel or @rev). The triples can be completed
when a resource becomes available, which will be when the next subject is specified (part of the process called
chaining).

The language. Note that there is no default language.

The term mappings, a list of terms and their associated
URIs. This specification does not define an initial list. Host Languages may define
an initial list. If a Host Language provides an initial list, it should do so via
an RDFa Profile document.

The default vocabulary, a value to
use as the prefix URI when a term is used. This
specification does not define an initial setting for the default
vocabulary. Host Languages may
define an initial setting.

During the course of processing, new evaluation contexts are created
which are passed to each child element.
The rules described below will determine the values of the items in the context. Additionally, some rules will cause new
triples to be created by combining information provided by an element
with information from the evaluation context.

During the course of processing a number of locally scoped values are needed, as follows:

An initially empty list of URI mappings, called the local list of URI mappings.

An initially empty list of incomplete triples, called the local list of incomplete triples.

A recurse flag. Processing generally continues recursively through the entire tree of elements available. However,
if an author indicates that some branch of the tree should be treated as an XML literal, no further processing should
take place on that branch, and setting this flag to false would have that effect.

A skip element flag, which indicates whether the current element can safely be ignored since it has
no relevant RDFa attributes. Note that descendant elements will still be processed.

A value for the current object literal, the literal to use when creating triples that have a literal object.

A value for the current object resource, the resource to use when creating triples that have a resource object.

The local term mappings, a list of terms and their associated
URIs.

A local default vocabulary, a URI to use as a prefix mapping when a term is used.

7.3 Chaining

Statement chaining is an RDFa feature that allows the author to link RDF statements together while avoiding unnecessary repetitive markup.
For example, if an author were to add statements as children of an object that was a resource, these statements
should be interpreted as being about that resource:

In this example we can see that an object resource ('Germany'), has become the subject for nested statements. This markup
also illustrates the basic chaining pattern of 'A has a B has a C' (i.e., Einstein has a birth place of Germany, which has
a long name of "Federal Republic of Germany").

It's also possible
for the subject of nested statements to provide the object for containing statements — essentially the reverse
of the example we have just seen. To illustrate, we'll take an example of the type of chaining just described,
and show how it could be marked up more efficiently. To start, we mark up
the fact that Albert Einstein had both German and American citizenship:

7.4 CURIE and URI Processing

Since RDFa is ultimately a means for transporting RDF, a key concept is the resource and its manifestation as a
URI. Since RDF deals with complete URIs (not relative paths), then when converting RDFa to triples, any relative URIs will need
to be resolved relative to the base URI, using the algorithm defined in section 5 of RFC 3986 [URI],
Reference Resolution.

Many of the attributes that hold URIs are also able to carry 'compact URIs' or CURIEs. A CURIE is a convenient way to represent
a long URI, by replacing a leading section of the URI with a substitution token. It's possible for authors to
define as many substitution tokens as they see fit; the full URI is obtained by locating the mapping defined by a token from a
list of in-scope tokens, and then simply concatenating the second part of the CURIE onto the mapped value.

For example, the full URI for Albert Einstein on DPPedia is:

http://dbpedia.org/resource/Albert_Einstein

This can be shortened by authors to make the information easier to manage, using a CURIE. The first step is for the
author to create a prefix mapping that links a prefix to some leading segment of the URI. In RDFa these mappings are
expressed using the XML namespace syntax:

<div prefix="db: http://dbpedia.org/">
...
</div>

Once the prefix has been established, an author can then use it to shorten a URI as follows:

The author is free to split the URI at any point, as long as it begins at the left end. However, since a common
use of CURIEs is to make available libraries of terms and values, the prefix will usually be mapped to some
common segment that provides the most re-use, often provided by those who manage the library of terms. For example,
since DBPedia contains an enormous list of resources, it is more efficient to create a prefix mapping that uses the
base location of the resources:

Note that it is generally considered a bad idea to use relative paths in prefix declarations. Since it is possible
that an author may ignore this guidance, it is further possible that the URI obtained from a CURIE is relative. However, since
all URIs must be resolved relative to base before being used to create triples, the use of relative paths should
not have any effect on processing.

7.4.1 Scoping of Prefix Mappings

CURIE prefix mappings are defined on the current element and
its descendants.
For example, the URIs expressed by the following two CURIEs are
different, despite the common prefix, because the prefix mappings are locally scoped:

7.4.2 General Use of CURIEs in Attributes

There are a number of ways that attributes make use of CURIEs, and they need to be dealt with
differently. These are:

An attribute may allow one or more values that are a mixture of CURIEs and URIs. In this case any value that is not
a CURIE, as outlined in section CURIE Syntax Definition,
will be processed as a URI.

If the value is surrounded by square
brackets, then the content within the brackets is always evaluated according to the rules in CURIE Syntax Definition - and if that content is not a CURIE, then the content must be ignored.

An empty attribute value (e.g., typeof='')
is still a CURIE, and is processed as such. The rules for
this processing are defined in Sequence.
Specifically, however, an empty attribute value is never treated
as a relative URI by this specification.

An example of an attribute that can contain a CURIEorURI is
@about. To express a URI directly, an author might do
this:

<div about="http://dbpedia.org/resource/Albert_Einstein">
...
</div>

whilst to express the URL above as a CURIE they would do this:

<div about="dbr:Albert_Einstein">
...
</div>

The author could also use a safe CURIE, as follows:

<div about="[dbr:Albert_Einstein]">
...
</div>

Since non-CURIE values must be ignored, the following value in @about would not
set a new subject, since @about does not permit the
use of TERMs, and the CURIE has no prefix separator.

<div about="[Albert_Einstein]">
...
</div>

However, this markup would set a subject, since it is not a CURIE, but a valid relative URI:

<div about="Albert_Einstein">
...
</div>

Note that several RDFa attributes are able to also take TERMS as their value.
This is discussed in the next section.

7.4.3 General Use of Terms in Attributes

Some RDFa attributes have a datatype that permits a term to be referenced.
RDFa defines the syntax of a term as:

One ramification of these rules is that, if an attribute
has the datatype TERMorCURIEorURI, and the value matches
the production for term but there is no local default vocabulary,
then the term is ignored.

7.4.4 Use of CURIEs in Specific Attributes

The general rules discussed in the previous sections apply to the RDFa attributes in the following ways:

7.4.5 Referencing Blank Nodes

In RDFa, it is possible to establish
relationships using various types of
resource references, including bnodes.
If a subject or object is defined using a CURIE, and that CURIE explicitly
names a bnode, then
a Conforming Processor must create the bnode when it is encountered
during parsing.
The RDFa Processor must also ensure
that no bnode created automatically (as a result of chaining)
has a name that
collides with a bnode that is defined by explicit reference in a CURIE.

In the above fragment, two bnodes are explicitly created as the subject
of triples. Those bnodes are then referenced to demonstrate the
relationship between the parties. After processing, the following
triples will be generated:

7.5 Sequence

Processing would normally begin after the document to be parsed has been completely loaded. However, there is no
requirement for this to be the case, and it is certainly possible to use a stream-based approach, such as
SAX [SAX] to extract
the RDFa information. However, if some approach other than the DOM traversal technique defined here is used, it
is important to ensure that Host Language-specific processing rules are applied
(e.g., XHTML+RDFa [XHTML-RDFA] indicates the base element can be used, and
base will affect the interpretation of URIs in meta or
link elements even if those elements are before the
base element in the stream).

At the beginning of processing, an initial evaluation context is created, as follows:

the base is set to the URL of the document (or another value specified in a language specific manner such as the HTML
base element);

Processing begins by applying the processing rules below to the document object, in the context of this initial
evaluation context. All elements in the tree are also processed according to the rules described
below, depth-first, although the evaluation context used for each set of rules will be based on
previous rules that may have been applied.

This specification assumes that certain
elements are present in the Host Language (e.g.,
head). If these elements are not supported in
the Host Language, then the corresponding processing rules
are not relevant for that language.

The working group as not reached consensus as to whether to include
the optional attributes in this specification, or whether to have them defined in the
relevant Host Language specifications.

Note that some of the local variables are temporary containers for values that will be passed to descendant elements via an
evaluation context. In some cases the containers will have the same name, so to make it clear which is being acted upon
in the following steps, the local version of an item will generally be referred to as such.

Mappings are defined via @prefix.
For backward compatibility, some Host Languages may also permit the
definition of mappings via @xmlns. In this case, the value to be mapped is
set by the XML namespace
prefix, and the value to map is the value of the attribute — a URI.
Regardless of how the mapping is declared,
the value to be mapped must be converted to lower case,
and the URI is not processed in any way; in
particular if it is a relative path it is not resolved
against the current base. Authors should not
use relative paths as the URI.

If in any of the previous steps a new subject was set to a non-null value, it is now used to provide a subject
for type values;

One or more 'types' for the new subject can be set by using @typeof. If present,
the attribute must contain one or more URIs, obtained according to the section on
URI and CURIE Processing, each of which is used to generate a triple as follows:

@datatype is present, and does not have an empty value, and is not set
to rdf:XMLLiteral.

The actual literal is either the value of @content (if present) or a string created by concatenating
the value of all descendant text nodes, of the current element in turn. The final string includes
the datatype URI, as described in [RDF-CONCEPTS], which will have been obtained according to the section on
CURIE and URI Processing.

Additionally, if there is a value for current language then the value of the plain literal should
include this language information, as described in [RDF-CONCEPTS]. The actual literal is either the value of
@content (if present)
or a string created by concatenating the text content of each of the descendant elements of the current element
in document order.

the current element has any child nodes that are not simply text nodes, and @datatype is
not present, or is present, but is set to rdf:XMLLiteral.

The value of the XML literal is a string created by serializing to text, all nodes that are
descendants of the current element, i.e., not including the element itself, and giving it a datatype of
rdf:XMLLiteral. The format of the
resulting serialized content is as defined in
Exclusive XML Canonicalization Version
[XML-EXC-C14N].

8. RDFa Processing in detail

This section provides an in-depth examination of the processing steps described in the previous section. It also includes
examples which may help clarify some of the steps involved.

The key to processing is that a triple is generated whenever a predicate/object combination is detected. The actual triple
generated
will include a subject that may have been set previously, so this is
tracked in the current evaluation context and is called
the parent subject. Since the subject will default to the current document if it hasn't been set explicitly, then
a predicate/object combination is always enough to generate one or more triples.

The attributes for setting a predicate are @rel, @rev and @property, whilst the attributes
for setting an object are @resource, @href, @content, and @src.
@typeof is unique in that it sets both a predicate and an object at the same time (and also a subject when it appears in the absence of other attributes that would set a subject).
Inline content might also set an object, if @content is not present, but @property is present.

8.1 Changing the evaluation context

8.1.1 Setting the current subject

When triples are created they will always be in relation to a subject resource which is provided either by new subject
(if there are rules on the current element that have set a subject) or parent subject, as passed in via the
evaluation context. This section looks at the specific ways in which these values are set. Note that it doesn't matter
how the subject is set, so in this section we use the idea of the current subject which may be eithernew subject or parent subject.

8.1.1.1 The current document

When parsing begins, the current subject will be the URI of the document being parsed,
or a value as set by a Host Language-provided mechanism such
as the base element in (X)HTML. This
means that any metadata found in the head of the document will concern the
document itself:

As processing progresses, any
@about attributes will change the current subject. The value of
@about is a URI or a CURIE. If it is a relative URI then it needs to be resolved
against the current base value. To illustrate how this affects the statements, note in this markup
how the properties inside the (X)HTML body element become part of a new calendar event object, rather
than referring to the document as they do in the head of the document:

Whilst @about explicitly creates a new context for statements, @typeof does so implicitly.
@typeof works differently to other ways of setting a predicate since the predicate is always
rdf:type, which means that the processor only requires one attribute, the value of the type.

Since @typeof is setting the type of an item, this means that if no
item exists one should automatically be created. This
involves generating a new bnode, and is examined in more detail below; it is
mentioned here because the bnode used by the new item will become the subject for further statements.

For example, an author may wish to create markup for a person using the FOAF vocabulary, but without having a clear
identifier for the item:

A bnode is simply a unique identifier that is only available to the processor, not to any external software. By generating
values internally, the processor is able to keep track of properties for _:a as being distinct from
_:b. But by not exposing these values to any external software, it is possible to have complete control over
the identifier, as well as preventing further statements being made about the item.

As described in the previous two sections, @about will always take precedence and mark a new subject, but if no
@about value is available then @typeof will do the same job, although using an implied identifier,
i.e., a bnode.

But if neither @about or @typeof are present, there are a number of ways that the subject could
be arrived at. One of these is to 'inherit' the subject from the containing statement, with the value to be inherited set
either explicitly, or implicitly.

The most usual way that an inherited subject might get set would be when the parent statement has an object that is a
resource. Returning to the earlier example, in which the long name for Germany was added, the following markup was used:

In this situation, all statements that are 'contained' by the object resource representing Germany (the value in
@resource) will have the same subject, making it easy for authors to add additional statements:

Note also that the same principle described here applies to @src and @href.

Inheriting an anonymous subject

There will be occasions when the author wants to elide the subject and object as shown above, but is not concerned
to name the resource that is common to the two statements (i.e., the object of the first statement, which is the subject
of the second). For example, to indicate that Einstein was influenced by Spinoza the following markup could well be used:

In RDF terms, the item that 'represents' Einstein is anonymous, since it has no URI to identify it. However,
the item is given an
automatically generated bnode, and it is onto this identifier that all
child statements are attached:

From the point of view of the markup, this latter layout is to be preferred, since it draws attention to the 'hanging
rel'. But from the point of view of an RDFa Processor, all of these permutations need to be supported.

8.2 Completing 'incomplete triples'

When a new subject is calculated, it is also used to complete any incomplete
triples that are pending. This situation arises when the author wants to 'chain' a number of statements together. For
example, an author could have a statement that Albert Einstein was born in Germany:

When this happens the @rel for 'birth place' is regarded as a 'hanging rel' because it has not yet generated
any triples, but these 'incomplete triples' are completed by the @about that appears on the next line. The first
step is therefore to store the two parts of the triple that the RDFa Processor does have, but without an object:

<http://dbpedia.org/resource/Albert_Einstein> dbp:birthPlace ? .

Then as processing continues, the RDFa Processor encounters the subject of the statement about the long name for Germany, and this is
used in two ways. First it is used to complete the 'incomplete triple':

Note that each occurrence of @about will complete any incomplete triples. For example, to mark up the fact that
Albert Einstein had both German and American citizenship, an author need only specify one @rel value that is then
used with multiple @about values:

These examples show how @about completes triples, but there are other situations that can have the same effect.
For example, when @typeof creates a new
bnode (as described above), that will be used to complete any 'incomplete
triples'. To illustrate,
to indicate that Spinoza influenced both Einstein and Schopenhauer, the following markup could be used:

This example has two 'hanging rels', and so two situations when 'incomplete triples' will be created. Processing would proceed
as follows; first an incomplete triple is stored:

<http://dbpedia.org/resource/Baruch_Spinoza> dbp:influenced ? .

Next, the RDFa Processor processes the predicate values for foaf:name, dbp:dateOfBirth and
dbp:citizenship, but note that only the first needs to 'complete' the 'hanging rel'. So processing
foaf:name generates two triples:

8.3 Object resolution

Although objects have been discussed in the previous sections, as part of the explanation of subject resolution, chaining,
evaluation contexts, and so on, this section will look at objects in more detail.

A literal object can be set by using @property
to express a predicate, and then using either @content,
or the inline text of the element that @property is on.
Note that the use of @content prohibits the inclusion of rich markup in your literal.
If the inline content of an element accurately represents the object,
then documents should rely upon that rather than duplicating that data using the @content.

A URI resource object can be set
using one of @rel or @rev to express a predicate,
and then either using one of @href, @resource
or @src to provide an object resource explicitly, or using
the chaining techniques described above to obtain an object from a nested
subject, or from a bnode.

8.3.1 Literal object resolution

An object literal will be generated when @property is present.
@property provides the predicate, and the following sections describe
how the actual literal to be generated is determined.

Language Tags

In RDFa the Host Language may provide a mechanism for
setting the language tag. In XHTML+RDFa [XHTML-RDFA], for example,
the XML language attribute @xml:lang or
the attribute @lang
is used to add this information, whether the plain literal is
designated by @content, or by the inline
text of the element:

Note that this requires that a URI mapping for the prefix rdf has been defined.
To make authoring easier, if there are child elements and no @datatype attribute,
then the effect is the same as if @datatype have been explicitly set to
rdf:XMLLiteral:

<h2 property="dc:title">
E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
</h2>

In the examples given here the sup element is actually part of the meaning
of the literal, but there will be situations where the extra markup means nothing, and
can therefore be ignored. In this situation an empty @datatype value
can be used to override the XML literal behaviour:

Note that the value of this XML Literal is the exclusive
canonicalization [XML-EXC-C14N] of the RDFa element's value.

Although the RDFa processing model requires visiting each
element in the tree, if the processor meets an XML literal then it
must not process any further down the tree. This is to prevent triples being generated from markup that is not actually
in the hierarchy. For example, we might want to set the
title of something to some markup that itself includes RDFa:

In this example the nested RDFa should not be parsed. This effectively means that the presence of @property
without @content will inhibit
any further processing, so authors
should watch out for stray attributes, especially if they find that they are getting fewer triples than they had
expected.

8.3.2 URI object resolution

Most of the rules governing the processing of objects that are resources are to be found in the processing descriptions
given above, since they are important for establishing the subject. This section aims to highlight general concepts, and
anything that might have been missed.

One or more URI objects are needed when @rel or @rev is present. Each attribute
will cause triples to be generated when used with @href, @resource or @src, or
with the subject value of any nested statement if none of these attributes are present.

It's also possible to use both @rel and @rev at the same time on
an element. This is particularly useful when two things stand in two different relationships
with each other, for example when a picture is taken by Mark, but that picture also depicts
him:

8.3.2.3 Incomplete triples

When a triple predicate has been expressed using @rel or @rev,
but no @href, @src,
or @resource exists on
the same element, there is a 'hanging rel'. This causes the current subject and
all possible predicates (with an indicator of whether they are 'forwards, i.e., @rel values,
or not, i.e., @rev values), to be stored as 'incomplete triples' pending discovery of a subject
that could be used to 'complete' those triples.

9. RDFa Profiles

RDFa Profiles are optional external documents that define collections of terms
and/or prefix mappings. These documents must be defined in an
approved RDFa Host Language (currently XHTML+RDFa [XHTML-RDFA]).
They may also be defined in other RDF serializations as well
(e.g., RDF/XML [RDF-SYNTAX-GRAMMAR] or Turtle [TURTLE]).
RDFa Profiles are referenced via @profile, and can
be used by document authors to simplify the task of adding semantic markup. When an
RDFa document includes @profile, the value of the attribute is evaluated in
order. For each URI in the value, do the following:

Attempt to retrieve the content of the
URI. If the retrieval fails, continue with the next URI in the value.

Otherwise, parse the retrieved content as an RDFa document (according to the
processing rules in that document's Host Language specification) and extract the triples
into a collection associated with that URI. Note: These triples must not be co-mingled
with the triples being extracted from any other URI.

For every extracted triple that
is the common subject of an rdfa:prefix and an rdfa:uri predicate,
create a mapping from the object literal of the rdfa:prefix predicate to the
object literal of the rdfa:uri predicate. Add or update this mapping in the
local list of URI mappings after transforming the 'prefix' component to
lower-case.

For every extracted triple that
is the common subject of an rdfa:term and an rdfa:uri predicate,
create a mapping from the object literal of the rdfa:term predicate to the
object literal of the rdfa:uri predicate. Add or update this mapping in the
local term mappings.

Once all the URIs in the @profile value have been processed, continue
with the normal processing of the current element.

It is possible that a referenced RDFa document will in turn reference
other documents via @profile. Regardless of the depth to which such
references might go, only the triples in the top level document effect current
processing.

RDFa Processor developers are permitted and encouraged to cache the
relevant triples retrieved via this mechanism, including embedding definitions for well known vocabularies in the implementation if appropriate.

If one of the objects is not a Literal or if there are additional
rdfa:uri or rdfa:term
predicates sharing the same subject, no mapping is created.

A. CURIE Datatypes

In order to facilitate the use of CURIEs in markup languages, this
specification defines some additional datatypes in the XHTML datatype
space (http://www.w3.org/1999/xhtml/datatypes/).
Markup languages that want to import these
definitions can find them in the
"datatypes" file for their schema grammar:

The datatypes TERMorCURIEorURI and TERMorCURIEorURIs are defined such that
an RDFa Processor must first evaluate the attribute value
to determine if it is a TERM. If it does not match the production
rules for TERM, then an RDFa Processor must evaluate the attribute
value to determine if it matches the production for CURIEorURI.

A.1 XML Schema Definition

This section is non-normative.

The following informative XML Schema definition for these datatypes is included as an example:

2008-05-01: Changed datatype name from URIorCURIE to URIorSafeCURIE. Added datatype implementation in Appendix B. Added text about preferring inline content to @content so you do not lose ability to have rich markup. [ShaneMcCarron]

2008-04-29: Changed processing rules so as to allow the generation of triples that have objects
which are bnodes, even if those bnodes never appear in a triple as a subject. [MarkBirbeck]

2008-04-28: The processing rules have been updated so that elements that do not contain any RDFa attributes
have no effect. At one point this step omitted to check for @property, meaning that elements
that contained only@property were being ignored. [MarkBirbeck]

2008-01-23: Updated to reflect latest task-force thinking re- the
processing of legacy values in @rel and @rev.
As part of this work, made the whole processing of CURIEs and URIs
much clearer. [MarkBirbeck]

2007-09-04: Migrated to XHTML 2 Working Group Publication System.
Converted to a format that is consistent with REC-Track documents.
Updated to reflect current processing model. Added normative definition
of CURIEs. Started updating prose to be consistent with current task
force agreements. [ShaneMcCarron], [StevenPemberton], [MarkBirbeck]

2007-04-06: fixed some of the language to talk about
"structure" rather than metadata. Added note regarding
space-separated values in predicate-denoting attributes.
[BenAdida]

2006-01-16: made the use of CURIE type for @rel,
@rev, @property consistent across
document (particularly section 2.4 was erroneous). [BenAdida]

D. Acknowledgments

This section is non-normative.

At the time of publication, the members of the
Semantic Web Deployment Working Group were: