Abstract

This specification defines rules and guidelines for adapting the RDF in
XHTML: Syntax and Processing (RDFa) specification for use in the HTML5 and
XHTML5 members of the HTML family. The rules defined in this document not
only apply to HTML5 documents in non-XML and XML mode, but also to HTML4
documents interpreted through the HTML5 parsing rules.

Status of this
document

This section describes the status of this document at the time of
its publication. Other documents may supersede this document. A list of
current W3C publications and the latest revision of this technical report
can be found in the W3C technical reports
index at http://www.w3.org/TR/.

This is the First Public Working Draft of the "HTML+RDFa: A mechanism
for embedding RDF in HTML" specification for review by W3C members and
other interested parties.

Implementors should be aware that this specification is not stable.
Implementors who are not taking part in the discussions are likely
to find the specification changing out from under them in incompatible
ways. Vendors interested in implementing this specification before
it eventually reaches the Candidate Recommendation stage should join the
aforementioned mailing lists and take part in the discussions.

Publication as a Working Draft does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite this
document as other than work in progress.

The publication of this document by the W3C as a W3C Working Draft does
not imply that all of the participants in the W3C HTML working group
endorse the contents of the specification. Indeed, for any section of the
specification, one can usually find many members of the working group or of
the W3C as a whole who object strongly to the current text, the existence
of the section at all, or the idea that the working group should even spend
time discussing the concept of that section.

The latest stable version of the editor's draft of this specification is
always available on the W3C CVS
server. The latest editor's working
copy (which may contain unfinished text in the process of being
prepared) is also available.

The W3C HTML Working Group is
the W3C working group responsible for this specification's progress along
the W3C Recommendation track.

This specification is an extension to the HTML5 language. All normative
content in the HTML5 specification, unless specifically overridden by this
specification, is intended to be the basis for this specification.

1 Introduction

Status:Working draft

This section is informative.

Today's web is built predominantly for human consumption. Even as
machine-readable data begins to permeate the web, it is typically
distributed in a separate file, with a separate format, and very limited
correspondence between the human and machine versions. As a result, web
browsers can provide only minimal assistance to humans in parsing and
processing web data: browsers only see presentation information. RDFa is
intended to solve the problem of machine-readable data in HTML documents.
RDFa provides a set of HTML attributes to augment visual data with
machine-readable hints. Using RDFa, authors may turn their existing
human-visible text and links into machine-readable data without repeating
content.

1.1 History

Status:Working draft

In early 2004, Mark Birbeck published a document named [XHTMLRDF] via the XHTML2 Working Group wherein he laid
the groundwork for what would eventually become RDFa (The Resource
Description Framework in Attributes).

In 2006, the work was co-sponsored by the Semantic Web Deployment Work
Group, which began to formalize a technology to express semantic data in
XHTML. This technology was successfully developed and reached consensus at
the W3C, later published as an official W3C Recommendation. While HTML
provides a mechanism to express the structure of a document (title,
paragraphs, links), RDFa provides a mechanism to express the meaning in a
document (people, places, events).

The document, titled "RDF in XHTML: Syntax and Processing" [XHTML+RDFa], defined a set of attributes and rules for
processing those attributes that resulted in the output of machine-readable
semantic data. While the document applied to XHTML, the attributes and
rules were always intended to operate across any tree-based structure
containing attributes on tree nodes (such as HTML4, SVG and ODF).

While RDFa was initially specified for use in XHTML, adoption by a
number of large organizations on the Web spurred RDFa's use in non-XHTML
languages. Its use in HTML4, before an official specification was developed
for those languages, caused concern regarding document conformance.

Over the years, the members of the RDFa Task Force [RDFaTF] had discussed the possibility of applying
the same attributes and processing rules outlined in the XHTML+RDFa
specification to all HTML family documents. By design, the possibility of a
unified semantic data expression mechanism between all HTML and XHTML
family documents was squarely in the realm of possibility.

This section describes the modifications to the original XHTML+RDFa
specification that permit the use of RDFa in all HTML family documents. By
using the attributes and processing rules described in the XHTML+RDFa
specification and heeding the minor changes in this section, authors can
expect to generate markup that produces the same semantic data output in
HTML4, HTML5 and XHTML5.

2 Parsing Model

This section is normative.

Section 5.5:
Sequence, of the [XHTML+RDFa] specification defines a
generic processing model for extracting RDF from a tree-based model. The
method of transforming an input document into a model suited for the RDFa
processing rules is intentionally not defined in the XHTML+RDFa
specification. The method of transformation was intended to be defined in
the implementation language, in this case, this section of the HTML+RDFa
specification.

The HTML5 and XHTML5 DOMs are each a super-set of the tree-based model
on which the RDFa processing rules operate. Therefore, a mapping mechanism
to translate from a DOM to a tree-model is not necessary. The HTML5 and
XHTML5 DOM, or equivalent data structure, should be used as input to the
RDFa processing rules. The normative language for construction of the HTML5
DOM and XHTML5 DOM is contained in the HTML5 specification.

2.1
Modifying the Input Document

This section is informative.

RDFa's tree-based processing rules, outlined in Section 5.5: Sequence of
the XHTML+RDFa specification, allow an input document to be automatically
corrected, cleaned-up, re-arranged, or modified in any way that is approved
by the host language prior to processing. For example, element nesting
issues in HTML documents may be corrected before the input document is
translated into the DOM, a valid tree-based model, on which the RDFa
processing rules will operate.

Any mechanism that generates a data structure equivalent to the HTML5 or
XHTML5 DOM, such as the html5lib library, may be used as the mechanism to
construct the tree-based model provided as input to the RDFa processing
rules.

3 Conformance
Requirements

Status:Working draft

This section is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [RFC2119].

3.1 Document
Conformance

In order for a document to claim that it is a conforming HTML+RDFa
document, it must provide the facilities described as mandatory in this
section. The document conformance criteria are listed below, of which only
a subset are mandatory:

All document conformance requirements stated as mandatory in the
HTML5 specification must be met.

There should be a version attribute on the
html element. The value of the version
attribute should be "HTML+RDFa 1.0" if the document is a non-XML mode
document, or "XHTML+RDFa 1.0" if the document is a XML mode
document.

There may be a link element contained in the
head element that contains profile for the the
rel attribute and
http://www.w3.org/1999/xhtml/vocab for the href
attribute.

3.2 User Agent
Conformance

A conforming RDFa user agent must:

Conform to all requirements listed in the Conformance
requirements section of the HTML5 specification.

Implement all of the features required by this specification.

Implement all of the features specified in the XHTML+RDFa
specification, excluding those features which are specifically overridden
by this specification as detailed in the Modifications to XHTML+RDFa.

3.3 RDFa
Processor Conformance

A conforming RDFa Processor must implement all of the mandatory features
specified in the XHTML+RDFa specification. It must also support any
mandatory features specified in this specification.

4
Modifications to XHTML+RDFa

This section is normative.

The [XHTML+RDFa]
Recommendation is the base document on which this specification builds.
XHTML+RDFa specifies the attributes, in Section 2.1: The RDFa
Attributes, and processing model, in Section 5: Processing
Model, for extracting RDF from an XHTML document. This section
specifies changes to the attributes and processing model defined in
XHTML+RDFa in order to support extracting RDF from HTML documents.

The requirements and rules, as specified in XHTML+RDFa and further
modified in this document, apply to all HTML5 documents. The RDFa Processor
operating on HTML and XHTML documents, specifically the resulting DOMs,
must apply the same processing rules for both types of serializations and
DOMs.

4.1 Specifying the language for a literal

The lang attribute must be processed in the same manner as
the xml:lang attribute is in the XHTML+RDFa specification,
Section 5.5:
Sequence, step #3.

If an author is editing an HTML fragment and is unsure of the final
encapsulating MIME type for their markup, it is suggested that the author
specify both lang and xml:lang where the value in
both attributes is exactly the same.

4.2 Invalid
XMLLiteral values

When generating literals of type XMLLiteral, the processor must ensure
that the output XMLLiteral is a namespace well-formed XML fragment. A
namespace well-formed XML fragment has the following properties:

The XML fragment, when placed inside of a single root element, must
validate as well-formed XML. The normative language that describes a
well-formed XML document is specified in Section 2.1 "Well-Formed
XML Documents" of the XML specification.

A case-insensitive match for the currently active xmlns
attribute as well as all currently active attributes starting with
xmlns: must be preserved in the generated XMLLiteral. This
preservation must be accomplished by placing all active namespaces in
each top-level element in the generated XMLLiteral, taking care to not
over-write pre-existing namespace values.

If the input is not a namespace well-formed XML fragment, the
processor must transform the input text in a way that ensures the
well-formedness rules described in this section. If a sequence of
characters cannot be transformed into a namespace well-formed XML fragment,
the triple containing the XMLLiteral must not be generated.

An RDFa Processor that transforms the XML fragment must use the
Coercing an HTML DOM into an Infoset rules, as specified in the HTML5
specification, prior to generating the triple containing the XMLLiteral.
The serialization algorithm that must be used for generating the XMLLiteral
is normatively defined in the Serializing
XHTML Fragments section of the HTML5 specification.

Transformation to a namespace well-formed XML fragment is required
because an application that consumes XMLLiteral data expects that data to
be a namespace well-formed XML fragment.

The transformation requirement does not apply to input data that are
text-only, such as literals that contain a datatype attribute
with an empty value (""), or input data that that contain only
text nodes.

An example transformation demonstrating the preservation of namespace
values is provided below. The → symbol is used to denote that the line
is a continuation of the previous line and is included purely for the
purposes of readability:

Note the preservation of the SVG namespace by injecting a new
xmlns attribute. Since the ex and rdf
namespaces are not used in either rect element, they are not
preserved in the XMLLiteral.

4.3xmlns:-Prefixed Attributes

While this section outlines xmlns: processing in RDFa, the
support for distributed extensibility in non-XML mode HTML5 (using xmlns
and xmlns:) is still an open issue. This section may be further modified
before Last Call based on progress made on the distributed extensibility
issue.

CURIE prefix mappings specified using attributes prepended with
xmlns: must be processed using the rules specified in Section 5.4, CURIE and URI Processing, contained in the XHTML+RDFa
specification.

Since CURIE prefix mappings have been specified using
xmlns:, and since HTML attribute names are case-insensitive,
CURIE prefix names declared using the xmlns:attribute-name
pattern xmlns:<PREFIX>="<URI>" should be specified
using only lower-case characters. For example, the text
"xmlns:" and the text in "<PREFIX>" should
be lower-case only. This is to ensure that prefix mappings are interpreted
in the same way between HTML (case-insensitive attribute names) and XHTML
(case-sensitive attribute names) document types.

5
Extensions to the HTML5 Syntax

This section is normative.

There are a few changes that are required to the HTML5 specification in
order to fully support RDFa. The following sub-sections outline the
necessary modifications to the base HTML5 specification.

5.1 The RDFa Attributes and Valid Values

All RDFa attributes and valid values (including CURIEs), as listed in
Section 2.1:
The RDFa Attributes, are conforming when used in an HTML5 or XHTML5
document.

5.2 Conformance Criteria for xmlns:-Prefixed
Attributes

While this section outlines xmlns: conformance criteria for
HTML+RDFa, the support for distributed extensibility in non-XML mode HTML5
(using xmlns and xmlns:) is still an open issue. This section may be
further modified before Last Call based on progress made on the distributed
extensibility issue.

Since RDFa uses attributes starting with xmlns: to specify
CURIE prefixes, it is important that any attribute starting with a
case-insensitive match on the text string "xmlns:" be
preserved in the DOM or other tree-like model that is passed to the RDFa
Processor. While it is specified that HTML5 must preserve these attributes
in the DOM, it must also accept these attributes as conforming in non-XML
HTML5. For documents conforming to this specification, attributes with
names that have the case insensitive prefix "xmlns:" are
conforming in both HTML5 and XHTML5.