Abstract

This specification defines rules and guidelines for adapting the RDFa Core
1.1 and RDFa Lite 1.1 specifications for use in HTML5 and XHTML5. The rules
defined in this specification not only apply to HTML5 documents in non-XML
and XML mode, but also to HTML4 and XHTML documents interpreted through the
HTML5 parsing rules.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This specification has been jointly developed by the RDFa Working Group and the
HTML Working Group. The document was
previously published via the HTML Working Group, but has since been transitioned
to the newly rechartered RDFa Working Group. The re-publication as a First Public
Working Draft should not imply that the specification is being re-worked heavily,
but is being done as a result of W3C process requirements. The expectation is
to have the document released as an official W3C Recommendation in 8-9 months
from the publication date of this document.
The specification
is currently being published by the
RDFa Working Group.

This specification is an extension to the HTML5 language. All normative
content in the HTML5 specification, unless specifically overridden by this
specification, is intended to be the basis for this specification.

A sample test harness is
available for software developers. This set of tests is not intended to be
exhaustive.
A community-maintained website contains more
information on further reading, developer tools, and software libraries
that can be used to extract RDFa data from Web documents.

This document was published by the RDFa Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-rdfa-wg@w3.org (subscribe, archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1. Introduction

This section is non-normative.

Today's web is built predominantly for human readers. Even as
machine-readable data begins to permeate the web, it is typically
distributed in a separate file, with a separate format, and very limited
correspondence between the human and machine versions. As a result, web
browsers can provide only minimal assistance to humans in parsing and
processing web pages: browsers only see presentation information. RDFa is
intended to solve the problem of marking up machine-readable data in HTML
documents. RDFa provides a set of HTML attributes to augment visual data with
machine-readable hints. Using RDFa, authors may turn their existing
human-visible text and links into machine-readable data without repeating
content.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

2.1 Document Conformance

There are two types of document conformance criteria for HTML
documents containing RDFa semantics; HTML+RDFa and
HTML+RDFa Lite.

The following conformance criteria apply to any HTML document
including RDFa markup:

All document conformance requirements stated as mandatory in the
HTML5 specification must be met.

The appropriate
Extensions to the HTML5 Syntax,
as described in this document, must be considered valid and conforming.
Note that there are fewer supported attributes if the RDFa Lite
syntax [RDFA-LITE] is desired over the more advanced set of RDFa
attributes outlined in RDFa Core.

All HTML5 elements and attributes should be used in a way that is
conformant with [HTML5]. All RDFa attributes should be used in a way that
is conformant with [RDFA-CORE] and this document.

Non-XML mode HTML+RDFa 1.1 documents should be labeled with the Internet Media Type
text/html as defined in section 12.1 of the HTML5 specification [HTML5].

2.2 RDFa Processor Conformance

The RDFa Processor conformance criteria are listed below, all of
which are mandatory:

An RDFa Processor must implement all of the mandatory features
specified in the RDFa Core 1.1 specification [RDFA-CORE].

An RDFa Processor must support any mandatory features described in this
specification.

2.3 User Agent Conformance

A User Agent is considered to be a type of RDFa Processor when the
User Agent stores or processes RDFa attributes and their values. The
reason there are separate RDFa Processor Conformance and a
User Agent Conformance sections is because one can be a valid
HTML5 RDFa Processor but not a valid HTML5 User Agent (for example, by only
providing a very small subset of rendering functionality).

The User Agent conformance criteria are listed below, all of which are
mandatory:

A User Agent must conform to all requirements listed in the
Conformance requirements section of the HTML5 specification.

A User Agent must implement all of the features required by this
specification.

A User Agent must implement all of the features required in the RDFa
Core 1.1 specification, excluding those features which are specifically
overridden by this specification as detailed in the Extensions to RDFa Core 1.1.

3. Extensions to RDFa Core 1.1

The RDFa Core 1.1 [RDFA-CORE] specification is the base document on
which this specification builds.
RDFa Core 1.1 specifies the attributes and syntax, in Section 5: Attributes and
Syntax, and processing model, in Section 7: Processing
Model, for extracting RDF from a Web document. This section
specifies changes to the attributes and processing model defined in
RDFa Core 1.1 in order to support extracting RDF from HTML documents.

The requirements and rules, as specified in RDFa Core and further
extended in this document, apply to all HTML5 documents. An RDFa Processor
operating on both HTML and XHTML documents, specifically on their
resulting DOMs or Infosets, must apply these processing rules
for HTML4, HTML5 and XHTML5
serializations, DOMs and/or Infosets.

3.1 Additional RDFa Processing Rules

Documents conforming to the rules in this specification are processed
according to [RDFA-CORE] with the following extensions:

HTML+RDFa uses an additional initial context by default,
http://www.w3.org/2011/rdfa-context/html-rdfa-1.1,
which must be applied after the initial context for [RDFA-CORE]
(http://www.w3.org/2011/rdfa-context/rdfa-1.1).

In
Section 7.5: Sequence,
processing step 11, the HTML5 datetime attribute
must be utilized when generating output, overriding any value expressed
using the content attribute.
If datetime is detected and the value of the attribute is
a valid xsd:dateTime, xsd:date, or
xsd:time, then a triple must be generated where
the
current property value is the respective datatype and the value is the
value contained in the datetime attribute.
If datatype is specified, it
must override the automatic datatype. If no datatype is
specified and the value does not match a xsd:dateTime,
xsd:date, or xsd:time pattern, a plain literal
must be generated with the associated language of the node, if available.

In
Section 7.5: Sequence,
processing step 11, the HTML5 value attribute must be
utilized when generating output.
If value is detected, it must override content
and must be processed according to the rules for content.

The version attribute is not supported in HTML5 and is
non-conforming. However, if an HTML+RDFa document contains the
version attribute on the html element, a conforming
RDFa Processor must examine the value of this attribute. If the value matches
that of a defined version of RDFa, then the processing rules for that version
must be used. If the value does not match a defined version, or there is no
version attribute, then the processing rules for the most recent
version of RDFa 1.1 must be used.

3.2 Modifying the Input Document

RDFa's tree-based processing rules, outlined in Section 7.5: Sequence of
the RDFa Core 1.1 specification [RDFA-CORE], allow an input document to be
automatically corrected, cleaned-up, re-arranged, or modified in any way that
is approved by the host language prior to processing. Element nesting issues
in HTML documents should be corrected before the input document is
translated into the DOM, a valid tree-based model, on which the RDFa
processing rules will operate.

Any mechanism that generates a data structure equivalent to the HTML5 or
XHTML5 DOM, such as the html5lib library, may be used as the mechanism to
construct the tree-based model provided as input to the RDFa processing
rules.

3.3 Specifying the language for a literal

RDFa Core 1.1 allows for the
current language
to be specified by the Host Language. In order for RDFa Processors to conform
to this specification, they must use the mechanism described in
The lang and xml:lang attributes section of the [HTML5]
specification to determine the
language
of a node.

If an author is editing an HTML fragment and is unsure of the final
encapsulating MIME type for his/her markup, it is suggested that the author
specify both lang and xml:lang where the value in
both attributes is exactly the same.

3.4 Invalid XMLLiteral values

When generating literals of type XMLLiteral, the processor must ensure
that the output XMLLiteral is a namespace well-formed XML fragment. A
namespace well-formed XML fragment has the following properties:

The XML fragment, when placed inside of a single root element, must
validate as well-formed XML. The normative language that describes a
well-formed XML document is specified in Section 2.1 "Well-Formed
XML Documents" of the XML specification.

The XML fragment, when placed inside of a single root element, must
retain all active namespace information. The currently active attributes
declared using xmlns and xmlns: stored in the
RDFa Processor's current
evaluation context
in the
list of IRI mappingsmust be preserved in the generated XMLLiteral. The PREFIX value for
xmlns:PREFIXmust be transformed to all lower-case characters
when preserving the value in the XMLLiteral. All active namespaces declared
via xmlns and xmlns:must be placed in each
top-level element in the generated XMLLiteral, taking care to not overwrite
pre-existing namespace values.

An RDFa Processor that transforms the XML fragment must use the
Coercing an HTML DOM into an Infoset algorithm, as specified in the HTML5
specification, followed by the algorithm defined in the Serializing
XHTML Fragments section of the HTML5 specification. If an error or
exception occurs at any point during the transformation, the triple containing
the XMLLiteral must not be generated.

Transformation to a namespace well-formed XML fragment is required
because an application that consumes XMLLiteral data expects that data to
be a namespace well-formed XML fragment.

The transformation requirement does not apply to input data that are
text-only, such as literals that contain a datatype attribute
with an empty value (""), or input data that that contain only
text nodes.

An example transformation demonstrating the preservation of namespace
values is provided below. The → symbol is used to denote that the line
is a continuation of the previous line and is included purely for the
purposes of readability:

4. Extensions to the HTML5 Syntax

There are a few attributes that are added as extensions to the HTML5
syntax in order to fully support RDFa:

If HTML+RDFa Lite document conformance is desired, all RDFa attributes and
valid values (including CURIEs), as listed in
RDFa Lite 1.1, Section 2: The Attributes,
must be allowed and validate as conforming when used in an HTML4, HTML5
or XHTML5 document. For the avoidance of doubt, the following RDFa attributes
are allowed on all elements in the HTML5 content model:
vocab, typeof, property,
resource, and prefix. All other attributes that
RDFa may process, like href and src, are only
allowed on the elements defined in the HTML5 specification.

If HTML+RDFa document conformance is desired, all RDFa attributes and
valid values (including CURIEs), as listed in
RDFa Core 1.1, Section 2.1: The RDFa Attributes, must be allowed and
validate as conforming when used in an HTML4, HTML5 or XHTML5 document.
For the avoidance of doubt, the following RDFa attributes
are allowed on all elements in the HTML5 content model:
vocab, typeof, property,
resource, prefix, content,
about, rel, rev, datatype,
and inlist. All other attributes that
RDFa may process, like href and src, are only
allowed on the elements defined in the HTML5 specification.

If any RDFa attribute is present on the link or
meta elements, they must be considered flow and
phrasing content if used outside of the head of the
document. If the RDFa property attribute is present on the
link element, the rel attribute is not
required.

5. Backwards Compatibility

RDFa Core 1.1 deprecates the usage of xmlns: in RDFa 1.1
documents. Web page authors should not use xmlns: to express
prefix mappings in RDFa 1.1 documents. Web page authors should use
the prefix attribute to specify prefix mappings.

However, there are times when XHTML+RDFa 1.0 documents are served by web
servers using the text/html MIMEType. In these instances, the
HTML5 specification asserts that the document is processed according to the
non-XML mode HTML5 processing rules. In these particular cases, it is
important that the prefixes declared via xmlns: are preserved
for the RDFa processors to ensure backwards-compatibility with RDFa 1.0
documents. The following sections detail the backwards compatibility
details for RDFa processor implementations.

5.1 xmlns:-Prefixed Attributes

The RDFa Core 1.1 [RDFA-CORE] specification effectively deprecates the
use of the xmlns: mechanism to declare CURIE prefix mappings in
favor of the prefix attribute. While utilizing
xmlns: is now frowned upon, there are instances where it is
unavoidable - such as publishing legacy documents as HTML5 or supporting
older XHTML+RDFa 1.0 documents that rely on the xmlns:
attribute.

Since CURIE prefix mappings have been specified using
xmlns:, and since HTML attribute names are case-insensitive,
CURIE prefix names declared using the xmlns:attribute-name
pattern xmlns:<PREFIX>="<URI>"should be specified
using only lower-case characters. For example, the text
"xmlns:" and the text in "<PREFIX>"should
be lower-case only. This is to ensure that prefix mappings are interpreted
in the same way between HTML (case-insensitive attribute names) and XHTML
(case-sensitive attribute names) document types.

5.2 Conformance Criteria for xmlns:-Prefixed Attributes

Since RDFa 1.0 documents may contain attributes starting with
xmlns: to specify CURIE prefixes, any attribute starting with
a case-insensitive match on the text string "xmlns:" must be
preserved in the DOM or other tree-like model that is passed to the RDFa
Processor.
For documents conforming to this specification, attributes with
names that have a case insensitive prefix matching "xmlns:"
must be considered conforming. Conformance checkers must
accept attribute names that have a case insensitive prefix matching
"xmlns:" as conforming. Conformance checkers should generate
warnings noting that the use of xmlns: is deprecated.

All attributes starting with a case insensitive prefix matching
"xmlns:" must conform to the production rules outlined in
Namespaces in XML [XML-NAMES11],
Section 3: Declaring Namespaces.
Documents that contain xmlns: attributes that do not conform to
Namespaces in XML must not be accepted as conforming.

5.3 Preserving Namespaces via Coercion to Infoset

This section needs feedback from the user agent vendors to
ensure that this feature does not conflict with user agent architecture and
has no technical reason that it cannot be implemented.

RDFa 1.0 documents may contain the xmlns: pattern to
declare prefix mappings, it is important that namespace information that
is declared in non-XML mode HTML5 documents are mapped to an Infoset
correctly. In order to ensure this mapping is performed correctly, the
"Coercing an HTML DOM into an infoset" rules defined in [HTML5]
must be extended to include the following rule:

If the XML API is namespace-aware, the tool must ensure that
([namespace
name], [local name],
[normalized
value]) namespace tuples are created when converting the non-XML mode
DOM into an Infoset. Given a standard xmlns: definition,
xmlns:foo="http://example.org/bar#", the [namespace name]
is http://www.w3.org/2000/xmlns/,
the [local name] is foo, and the
[normalized value] is http://example.org/bar#, thus the
namespace tuple would be (http://www.w3.org/2000/xmlns/,
foo, http://example.org/bar#).

For example, given the following input text:

<div xmlns:com="http://purl.org/commerce#">

The div element above, when coerced from an HTML DOM into
an Infoset, should contain an attribute in the [namespace
attributes] list with a [namespace name] set to
"http://www.w3.org/2000/xmlns/", a [local name] set to
com, and a [normalized value] of
"http://purl.org/commerce#".

5.4 Infoset-based Processors

While the intent of the RDFa processing instructions are to provide a
set of rules that are as language and toolchain agnostic as possible, for
the sake of clarity, detailed methods of extracting RDFa content from
processors operating on an XML Information Set are provided below.

5.4.1 Extracting IRI mappings from Infosets

Extracting IRI mappings declared via xmlns:
while operating from within an Infoset-based RDFa processor can be achieved
using the following algorithm:

For each attribute in the [attributes] list
that has no value for [prefix] and a
[local
name] that starts with xmlns:, create a [IRI mapping] by
storing the [local name] part with the xmlns: characters
removed as the value to be mapped, and the [normalized
value] as the value to map.

This step is unnecessary if the Infoset coercion
rules preserve namespaces specified in non-XML mode.

For example, assume that the following markup is processed by an
Infoset-based RDFa processor:

<div xmlns:audio="http://purl.org/media/audio#" ...

After the markup is processed, there should exist a [IRI mapping] in
the [local list of IRI mappings] that contains a mapping from
audio to http://purl.org/media/audio#.

5.4.2 Processing RDFa Attributes

There are a number of non-prefixed attributes that are associated with
RDFa Processing in HTML5. If an XML Information Set based RDFa processor is
used to process these attributes, the following algorithm should be used to
detect and extract the values of the attributes.

For each Attribute Information Item specific to RDFa in the Infoset
[attributes]
list that has a [prefix] with
no value, extract and use the [normalized
value].

5.5 DOM Level 1 and Level 2-based Processors

This mechanism should be double-checked against all of the
RDFa Javascript implementations to ensure correctness.

Most DOM-aware RDFa Processors are capable of accessing DOM Level 1
[DOM-LEVEL-1]
methods to process attributes on elements. To discover all
xmlns:-specified CURIE prefix mappings, the
Node.attributes
NamedNodeMap can be iterated over. Each
Attr.name that
starts with the text string xmlns: specifies a CURIE prefix
mapping. The value to be mapped is the string after the xmlns:
substring in the Attr.name variable and the value to be mapped is
the value of the Attr.value variable.

The intent of the RDFa processing instructions are to provide a
set of rules that are as language and toolchain agnostic as possible. If
a developer chooses to not use the DOM1 environment mechanism outlined in
the previous paragraph, they may use the following DOM2 [DOM-LEVEL-2-CORE]
environment mechanism.

5.5.1 Extracting IRI mappings via DOM Level 2

Extracting IRI mappings declared via xmlns: while operating
from within a DOM Level 2 based RDFa processor can be achieved using the
following algorithm:

This step is unnecessary if the XML and non-XML
mode DOMs are namespace consistent.

For example, assume that the following markup is processed by a
DOM2-based RDFa processor:

<div xmlns:com="http://purl.org/commerce#" ...

After the markup is processed, there should exist a [IRI mapping] in
the [local list of IRI mappings] that contains a mapping from
com to http://purl.org/commerce#.

5.5.2 Processing RDFa Attributes

There are a number of non-prefixed attributes that are associated with
RDFa processing in HTML5. If an DOM2-based RDFa processor is used to
process these attributes, the following algorithm should be used to detect
and extract the values of the attributes.

When extracting values from href,
src and data, Web authors and developers should
note that certain values may be transformed if accessed via the DOM versus
a non-DOM processor. The rules for modification of URL values can be
found in the main HTML5 specification under
Section 2.6.2: Parsing URLs.

A. About this Document

A.1 History

This section is non-normative.

In early 2004, Mark Birbeck published a document named "RDF in XHTML"
via the XHTML2 Working Group wherein he laid
the groundwork for what would eventually become RDFa (The Resource
Description Framework in Attributes).

In 2006, the work was co-sponsored by the Semantic Web Deployment Work
Group, which began to formalize a technology to express semantic data in
XHTML. This technology was successfully developed and reached consensus at
the W3C, later published as an official W3C Recommendation. While HTML
provides a mechanism to express the structure of a document (title,
paragraphs, links), RDFa provides a mechanism to express the meaning in a
document (people, places, events).

The document, titled "RDF in XHTML: Syntax and Processing" [XHTML-RDFA],
defined a set of attributes and rules for
processing those attributes that resulted in the output of machine-readable
semantic data. While the document applied to XHTML, the attributes and
rules were always intended to operate across any tree-based structure
containing attributes on tree nodes (such as HTML4, SVG and ODF).

While RDFa was initially specified for use in XHTML, adoption by a
number of large organizations on the Web spurred RDFa's use in non-XHTML
languages. Its use in HTML4, before an official specification was developed
for those languages, caused concern regarding document conformance.

Over the years, the members of the
RDFa Task Force had discussed the possibility
of applying
the same attributes and processing rules outlined in the XHTML+RDFa
specification to all HTML family documents. By design, the possibility of a
unified semantic data expression mechanism between all HTML and XHTML
family documents was squarely in the realm of possibility.

An RDFa Working Group was created in 2010 to address the issues concerning
multiple language implementations of RDFa. The XHTML+RDFa document was split
into a base specification, called RDFa Core 1.1 [RDFA-CORE], and thin
specifications that layer above RDFa Core 1.1. The XHTML+RDFa 1.1 specification
[XHTML-RDFA] is an example of such a thin specification. This
document, also a thin specification, is targeted at HTML4, HTML5 and
XHTML5.

This document describes the extensions to the RDFa Core 1.1
specification that permits the use of RDFa in all HTML family documents. By
using the attributes and processing rules described in the RDFa Core 1.1
specification and heeding the minor changes in this document, authors can
generate markup that produces the same semantic data output in
HTML4, HTML5 and XHTML5.

2011-12-30: Addition of normative dependency for RDFa Lite 1.1.
Addition of rules to allow meta and
link elements in flow and phrasing content as long as they
contain at least one RDFa-specific attribute. Added support for
datetime and value processing.

2012-03-10: Clarification of where each RDFa attribute is allowed to be
used. Feature at risk warning for HTML4+RDFa DTD-based validation.

2012-09-10: Publishing control of the HTML+RDFa document is handed over
from the HTML WG to the newly re-chartered RDFa WG. DTD-based validation is
removed from the specification.

A.3 Acknowledgments

This section is non-normative.

At the time of publication, the members of the
RDF Web Applications Working Group were: