Abstract

A document that uses polyglot markup is an HTML5 document which is
at the same time an XML document and an HTML document, and which meets a
well defined set of constraints.
Polyglot markup that meets these constraints as interpreted as
compatible, regardless of whether they are processed as HTML or as
XHTML, per the HTML5 specification.
Polyglot markup uses a specific doctype, namespace declarations,
and a specific case—normally lower case but occasionally camel case—for
element and attribute names.
Polyglot markup uses lower case for certain attribute values.
Further constraints include those on empty elements, named entity
references, and the use of scripts and style.

Status of This
Document

This section describes the status of this document
at the time of its publication. Other documents may supersede this
document. A list of current W3C
publications and the latest revision of this technical report can be
found in the W3C technical reports index at
http://www.w3.org/TR/.

This document summarizes design guidelines for authors who wish
their XHTML or HTML documents to validate on either HTML or XML
parsers, assuming the parsers to be HTML5-compliant.
This specification is intended to be used by web authors. It is
not a specification for user agents and creates no obligations on user
agents.
Note that this recommendation does not define how HTML5-conforming
user agents should process HTML documents.
Nor does it define the meaning of the Internet Media Type
text/html. For user agent guidance and for these definitions, see [HTML5] and [RFC2854].

This document was published by the W3C HTML as a First Public Working Draft. This
document is intended to become a W3C Recommendation. If you wish to make comments
regarding this document, please send them to public-html@w3.org (subscribe,
archives).
All feedback is welcome.

Publication as a Working Draft does not
imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite
this document as other than work in progress.

1. Introduction

This
section is non-normative.

It is often valuable to be able to serve HTML5 documents that
are also valid XML documents.
An author may, for example, use XML tools to generate a
document, and they and others may process the document using XML tools.
These documents are served as text/html.
The language used to create documents that can be parsed by both HTML
and XML parsers is called polyglot markup.
Polyglot markup is the overlap language of documents which are both
HTML5 documents and XML documents.

2. Processing Instructions
and the XML Declaration

Polyglot markup does not use processing instructions.
Note that the parsing rules for the XML declaration are not
processing instructions and are defined separately in Prolog and Document Type
Declaration.

3. Character Encoding

Polyglot markup uses either UTF-8 or UTF-16, although generally UTF-8
is preferred.
When polyglot markup uses UTF-16, it should include the BOM indicating UTF-16LE or
UTF-16BE.
In addition, polyglot markup need not include the meta charset
declaration, because the parser would have to read UTF-16 in order to
parse it by definition.

In short, for correct character encoding, polyglot markup must either:

Use UTF-8 or UTF-16 with the appropriate BOM.

OR

Use both the XML Declaration and meta tag to
specify the appropriate character encoding.

If polyglot markup uses an encoding other than UTF-8 or UTF-16, it must include the XML declaration;
however, in this case the document must
also include the HTML meta tag specifying the character
set.
When polyglot markup uses both the XML declaration and the HTML meta
tag, these must specify the same
character and coding.

4. The DOCTYPE

Polyglot markup uses the <!DOCTYPE html> doctype.
Note that for polyglot markup the string, html, must be lower case.
For a pure HTML document, the string is defined as case-insensitive. [HTML5]

The xlink prefix is defined as xmlns:xlink="http://www.w3.org/1999/xlink"
before using xlink:href. The prefix can be defined either:

Once on the root <html> element.

Once on each <svg> element that contains
one or more elements with xlink:href attributes.

No other elements should have namespace declarations.

6. Elements

6.1 Required Elements

Each document using polyglot markup must have a root html element.
The root html element must contain both a head and a body
element.
The head element must
contain a title element.

6.1.1 Tables

Polyglot markup must
explicitly have a tbody element surrounding groups of tr
elements within a table element.
HTML parsers insert the tbody element, but XML
parsers do not, thus creating different DOMs.

Correct:

<table>
<tbody>
<tr>...

Incorrect:

<table>
<tr>...

6.2 Case-Sensitivity

The following guidelines apply to any usage of element names,
attribute names, or attribute values in markup, script, or CSS.
When required, polyglot markup uses lower case letters for all ASCII
letters; however, case requirements do not apply to non-ASCII letters
such as Greek, Cyrillic, or non-ASCII Latin letters.

6.2.1 Element Names

Polyglot markup uses the correct case for element names.

Polyglot markup uses lowercase letters for all HTML element names.

Polyglot markup uses lowercase letters for all MathML element
names.

Polyglot markup uses lowercase letters for all SVG element names
except the following, which must
be in mixed case:

The lowercase definitionurlmust be changed to the mixed case definitionURL.

Polyglot markup uses lowercase letters in attribute names
for all SVG elements except the following, which must be in mixed case:

attributeName

attributeType

baseFrequency

baseProfile

calcMode

clipPathUnits

contentScriptType

contentStyleType

diffuseConstant

edgeMode

externalResourcesRequired

filterRes

filterUnits

glyphRef

gradientTransform

gradientUnits

kernelMatrix

kernelUnitLength

keyPoints

keySplines

keyTimes

lengthAdjust

limitingConeAngle

markerHeight

markerUnits

markerWidth

maskContentUnits

maskUnits

numOctaves

pathLength

patternContentUnits

patternTransform

patternUnits

pointsAtX

pointsAtY

pointsAtZ

preserveAlpha

preserveAspectRatio

primitiveUnits

refX

refY

repeatCount

repeatDur

requiredExtensions

requiredFeatures

specularConstant

specularExponent

spreadMethod

startOffset

stdDeviation

stitchTiles

surfaceScale

systemLanguage

tableValues

targetX

targetY

textLength

viewBox

viewTarget

xChannelSelector

yChannelSelector

zoomAndPan

6.2.3 Attribute Values

Polyglot markup uses lowercase letters for the values of the
attributes in the following list when they exist on HTML elements.
More specifically, where required, polyglot markup must use lower case letters for all ASCII letters
in these attribute values; however, case requirements do not apply to
non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters.
Attributes for HTML elements other than those in the following list may have values made of mixed case
letters.
All attributes on non-HTML elements may have values made of mixed
case letters.

accept

accept-charset

align

alink

axis

bgcolor

charset

checked

clear

codetype

color

compact

declare

defer

dir

direction

disabled

enctype

face

frame

hreflang

http-equiv

lang

language

link

media

method

multiple

nohref

noresize

noshade

nowrap

readonly

rel

rev

rules

scope

scrolling

selected

shape

target

text

type

valign

valuetype

vlink

6.3 Empty Elements

Polyglot markup uses only the elements in the following list as
empty elements.

Given an empty instance of an element whose content model is not
EMPTY (for example, an empty title or paragraph) polyglot markup does
not use the minimized form (e.g. the document uses <p></p>
and not <p />).

Note that MathML and SVG elements may be either self-closing or
contain content.

7. Attributes

Polyglot markup does not contain line breaks and multiple white
space characters within attribute values. These are handled
inconsistently by user agents.

Polyglot markup surrounds all attribute values with quotation
marks. Attribute values may be
surrounded either by single quotation marks or by double quotation
marks.

9. Script and Style

Script and style commands should
be included by linking to external files rather than including them
in-line.
However, polyglot markup must
not link to an external stylesheet by using the xml-stylesheet
processing instruction.
See also Processing Instructions and the
XML Declaration.

The following examples show the proper way to include external
script and style, respectively:

<script src="external.js"></script>

<link rel="stylesheet" href="external.css"/>

Although document.write() and document.writeln()
are valid in an HTML document, neither function may be used in XHTML.
Therefore, neither is used in polyglot markup.
Instead, use the innerHTML property for both HTML
and XHTML.
Note that the innerHTML property takes a string.
XML parsers parse the string as XML in XHTML.
HTML parsers parse the string as HTML in HTML.
Because of the difference in parsing, if you send the parser
content that does not follow the rules for polyglot markup the results
will differ for a DOM create with an XML parser and one created with an
HTML parser.

9.1 External Script and Style

Polyglot markup uses external scripts if that document's script
or style sheet uses < or & or ]]>
or --.
Note that XML parsers are permitted to silently remove the
contents of comments; therefore, the historical practice of hiding
scripts and style sheets within comments to make the documents backward
compatible is likely to not work as expected in XML-based user agents.

9.2 In-line Script and Style

If polyglot markup must use script or style commands within its
source code, either use safe content or wrap the command in a CDATA
section.
However, polyglot markup does not use a CDATA
section unless it is being used within foreign content.

9.2.1 Safe Content

Safe content is content that does not contain a <
or & character.
The following example is safe because it does not contain
problematic characters within the <script> tag.

9.2.2 Wrapping a Command in a
CDATA Section

Note that you cannot achieve same DOM in both XHTML and HTML by
using in-line commands in a CDATA section.
However, this is not usally a problem unless the code has a
dependency on the exact number of text nodes under a <script>
or <style> element.
The following examples show in-line script and style commands
wrapped in a CDATA section.

<script>
//<![CDATA[
(script goes here)
//]]>
</script>

<style>
/*<![CDATA[*/
(styles go here)
/*]]>*/
</style>

When using MathML or SVG, the parser follows the XML parsing
rules.
Polyglot markup does not rely on getting a CDATA instance from
the DOM when using MathML or SVG, because the HTML parser does not
create a CDATA instance in the DOM.