This specification defines APIs for the parsing and serializing of HTML and XML-based DOM nodes
for web applications.

Candidate Recommendation Exit Criteria

This specification will not advance to Proposed Recommendation before the spec's
test suite is completed and two or more independent
implementations pass each test, although no single implementation must pass each test. We expect
to meet this criteria no sooner than 24 October 2014. The group will also create an
Implementation
Report.

The IDL fragments in this specification must be interpreted as required for conforming IDL
fragments, as described in the Web IDL specification. [[!WEBIDL]]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space
characters" or "return false and terminate these steps") are to be interpreted with the meaning of
the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps may be implemented in any
manner, so long as the end result is equivalent. (In particular, the algorithms defined in this
specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g.
to prevent denial of service attacks, to guard against running out of memory, or to work around
platform-specific limitations.

When a method or an attribute is said to call another method or attribute, the user agent must
invoke its internal API for that attribute or method so that e.g. the author can't change the
behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
[[ECMA-262]]

If an algorithm calls into another algorithm, any exception that is thrown by the latter
(unless it is explicitly caught), must cause the former to terminate, and the exception to be
propagated up to its caller.

Extensibility

Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors
must not use such extensions, as doing so reduces interoperability and fragments the user base,
allowing only users of specific user agents to access the content in question.

If vendor-specific extensions are needed, the members should be prefixed by vendor-specific
strings to prevent clashes with future versions of this specification. Extensions must be defined
so that the use of extensions neither contradicts nor causes the non-conformance of functionality
defined in the specification.

When vendor-neutral extensions to this specification are needed, either this specification can
be updated accordingly, or an extension specification can be written that overrides the
requirements in this specification. Such an extension specification becomes an
applicable specification for the purposes of conformance requirements in this
specification.

Introduction

A document object model (DOM) is an in-memory representation of various types of Nodes
where each Node is connected in a tree. The [[HTML5]] and [[DOM4]] specifications describe
DOM and its Nodes is greater detail.

Parsing is the term used for converting a string representation of a DOM into an
actual DOM, and Serializing is the term used to transform a DOM back into a string.
This specification concerns itself with defining various APIs for both parsing and serializing a
DOM.

For example: the innerHTML API is a common way to both
parse and serialize a DOM (it does both). If a particular Node, has the following in-memory
DOM:

To parse new children for myDiv from a string (replacing its existing
children), simply set the innerHTML property (this triggers
parsing of the assigned string):

myDiv.innerHTML = "<span>new</span><em>children!</em>";

This specification describes two flavors of parsing and serializing: HTML and
XML (with XHTML being a type of XML). Each follows the rules of its respective markup language.
The above example shows HTML parsing and serialization. The specific algorithms for HTML parsing
and serializing are defined in the [[HTML5]] specification. This specification contains the
algorithm for XML serializing. The grammar for XML parsing is described in the [[XML10]]
specification.

Round-tripping a DOM means to serialize and then immediately parse the serialized
string back into a DOM. Ideally, this process does not result in any data loss with respect to the
identity and attributes of the Node in the DOM.
Round-tripping is especially tricky for an XML serialization, which must be concerned with
preserving the Node's namespace identity in the serialization (wereas namespaces are
ignored in HTML).

An XML serialization must include the HTMLScriptElementNode's
namespace in order to preserve the identity of the
script element, and to allow the serialized string to
round-trip through an XML parser. Assuming that root
is in a variable named root:

XML Serialization

Elements and attributes will always be serialized such
that their namespaceURI is preserved. In some cases this means that an existing
prefix, prefix declaration attribute or default namespace declaration attribute
might be dropped, substituted or changed. An HTML serialization does not attempt to
preserve the namespaceURI.

Let prefix index be a generated namespace prefix index with value
1. The generated namespace prefix index is used to generate a new unique
prefix value when no suitable existing namespace prefix is available to serialize a
node's namespaceURI (or the namespaceURI of one of
node's attributes). See the generate a prefix algorithm.

The XML serialization algorithmproduces an XML serialization of an arbitrary DOM nodenode based on the node's interface type. Each referenced algorithm is
to be passed the arguments as they were recieved by the caller and return their result to the
caller. Re-throw any exceptions. If node's interface is:

XML serializing an Element node

If the require well-formed flag is set (its value is true),
and this node's localName attribute contains the character
":" (U+003A COLON) or does not match the XML Name production, then
throw an exception; the serialization of this node would not be a well-formed
element.

Let local prefixes map be an empty map. The map has unique Nodeprefix strings as its keys, with corresponding namespaceURINode values as the map's key values (in this map, the null namespace is
represented by the empty string).

This map is local to each element. It is used to ensure there are no conflicting
prefixes should a new namespace prefix attribute need to be
generated. It is also used to enable skipping of duplicate
prefix definitions when
writing an element's attributes: the map
allows the algorithm to distinguish between a prefix in the
namespace prefix map that might be locally-defined (to the current Element) and
one that is not.

The above step will update map with any found namespace prefix
definitions, add the found prefix definitions to the local prefixes map and
return a local default namespace value defined by a default namespace attribute if one
exists. Otherwise it returns null.

Append to qualified name the concatenation of
candidate prefix, ":" (U+003A COLON), and node's localName. There exists on this node or the
node's ancestry a namespace prefix definition that defines the node's
namespace.

By this step, there is no namespace or prefix mapping declaration in this
node (or any parent node visited by this algorithm) that defines
prefix otherwise the step labelled Found a suitable namespace prefix would
have been followed. The sub-steps that follow will create a new namespace prefix declaration
for prefix and ensure that prefix does not conflict with an existing
namespace prefix declaration of the same localName in node's
attribute list.

At this point, the namespace for this node still needs to be serialized, but
there's no prefix (or candidate prefix) availble; the following uses
the default namespace declaration to define the namespace--optionally replacing an existing
default declaration if present.

If ns is the HTML namespace, and the node's list of
children is empty, and the node's localName matches any
one of the following void elements:
"area",
"base",
"basefont",
"bgsound",
"br",
"col",
"embed",
"frame",
"hr",
"img",
"input",
"keygen",
"link",
"menuitem",
"meta",
"param",
"source",
"track",
"wbr";
then append the following to markup, in the order listed:

The following conditional steps find namespace prefixes. Only attributes
in the XMLNS namespace are considered (e.g., attributes made to look like namespace
declarations via setAttribute("xmlns:pretend-prefix",
"pretend-namespace") are not included).

If attribute prefix is null, then attr is a default
namespace declaration. Set the default namespace attr value to attr's
value and stop running these steps, returning to Main to visit
the next attribute.

Otherwise, the attribute prefix is not null and attr
is a namespace prefix definition. Run the following steps:

If namespace definition is the XML namespace, then stop running
these steps, and return to Main to visit the next attribute.

XML namespace definitions in prefixes are completely ignored (in
order to avoid unnecessary work when there might be prefix conflicts).
XML namespaced elements are always handled uniformly by prefixing (and overriding
if necessary) the element's localname with the reserved "xml" prefix.

If namespace definition is the empty string (the declarative form of
having no namespace), then let namespace definition be null
instead.

If prefix definition is found in map given the namespace
namespace definition, then stop running these steps, and return to Main
to visit the next attribute.

This step avoids adding duplicate prefix definitions for the same namespace
in the map. This has the side-effect of avoiding later serialization of
duplicate namespace prefix declarations in any descendant nodes.

Add the value of prefix definition as a new key to the
local prefixes map, with the namespace definition as the
key's value replacing the value of null with the empty string if applicable.

Return the value of default namespace attr value.

The empty string is a legitimate return value and is not converted to
null.

The Namespace Prefix Map

A namespace prefix map is a map that associates namespaceURI and
namespace prefix lists, where namespaceURI values are the
map's unique keys (which can include the null value representing no namespace), and
ordered lists of associated prefix values are the map's key values. The
namespace prefix map will be populated by previously seen namespaceURIs and all their
previously encountered prefix associations for a given node and its ancestors.

Note: the last seen prefix for a given
namespaceURI is at the end of its respective list. The list is searched to
find potentially matching prefixes, and if no matches are found for the given
namespaceURI, then the last prefix in the list is used. See
copy a namespace prefix map and retrieve a preferred prefix string for additional
details.

To copy a namespace prefix mapmap means to copy the map's
keys into a new empty namespace prefix map, and to copy each of the values in the
namespace prefix list associated with each keys' value into a new
list which should be associated with the respective key in the new map.

To retrieve a preferred prefix stringpreferred prefix from the namespace prefix mapmap given a namespace
ns, the user agent should:

Let candidates list be the result of retrieving a list from map
where there exists a key in map that matches the value of ns or if there
is no such key, then stop running these steps, and return the null value.

Otherwise, for each prefix value prefix in candidates list, iterating
from beginning to end:

There will always be at least one prefix value in the list.

If prefix matches preferred prefix, then stop running these steps
and return prefix.

If prefix is the last item in the candidates list, then stop running
these steps and return prefix.

To check if a prefix string prefix is found in a
namespace prefix mapmap given a namespace ns, the user agent should:

Let candidates list be the result of retrieving a list from map
where there exists a key in map that matches the value of ns or if there
is no such key, then stop running these steps, and return false.

If the value of prefix occurs at least once in candidates list, return
true, otherwise return false.

To add a prefix string prefix to the namespace prefix mapmap given a namespace ns, the user agent should:

Let candidates list be the result of retrieving a list from map
where there exists a key in map that matches the value of ns or if there
is no such key, then let candidates list be null.

If candidates list is null, then create a new list with
prefix as the only item in the list, and associate that list with a new
key ns in map.

Otherwise, append prefix to the end of candidates list.

The steps in retrieve a preferred prefix string use the list to
track the most recently used (MRU) prefix associated with a given namespace, which
will be the prefix at the end of the list. This list may contain duplicates of the
same prefix value seen earlier (and that's OK).

Let localname set be a new empty namespace localname set. This
localname set will contain tuples of unique attribute
namespaceURI and localName pairs, and is populated as
each attr is processed. This set is used to [optionally] enforce
the well-formed constraint that an element cannot have two attributes with the same
namespaceURI and localName. This can occur when two
otherwise identical attributes on the same element differ only by their prefix values.

If the require well-formed flag is set (its value is true), and the
localname set contains a tuple whose values match those of a new tuple consisting
of attr's namespaceURI attribute and
localName attribute, then throw an exception; the serialization of
this attr would fail to produce a well-formed element serialization.

Create a new tuple consisting of attr's namespaceURI
attribute and localName attribute, and add it to the
localname set.

The XML namespace cannot be redeclared and survive
round-tripping (unless it defines the prefix "xml"). To avoid this
problem, this algorithm always prefixes elements in the XML namespace with
"xml" and drops any related definitions as seen in the above condition.

and furthermore that the attr's
localName (as the prefix to
find) is found in the namespace prefix map given the namespace consisting
of the attr's value (the current namespace prefix
definition was exactly defined previously--on an ancestor element not the current
element whose attributes are being processed).

If the require well-formed flag is set (its value is true), and
the value of attr's value attribute matches the
XMLNS namespace, then throw an exception; the serialization of this attribute would
produce invalid XML because the XMLNS namespace is reserved and cannot be applied
as an element's namespace via XML parsing.

DOM APIs do allow creation of elements in the XMLNS namespace but
with strict qualifications.

If the require well-formed flag is set (its value is true), and
the value of attr's value attribute is the empty string,
then throw an exception; namespace prefix declarations cannot be used to undeclare a
namespace (use a default namespace declaration instead).

the attr's prefix matches the string
"xmlns", then let candidate prefix be the string
"xmlns".

Otherwise, the attribute namespace in not the XMLNS namespace. Run
these steps:

Let candidate prefix be the result of generating a prefix providing
map, attribute namespace, and prefix index as input.

If candidate prefix is not null, then append to result
the concatenation of candidate prefix with ":" (U+003A COLON).

If the require well-formed flag is set (its value is true), and this
attr's localName attribute contains the character
":" (U+003A COLON) or does not match the XML Name production or equals
"xmlns" and attribute namespace is null, then
throw an exception; the serialization of this attr would not be a
well-formed attribute.

When serializing an attribute value given an attribute value and
require well-formed flag, the user agent must run the following steps:

If the require well-formed flag is set (its value is true), and
attribute value contains characters that are not matched by the XML Char
production, then throw an exception; the serialization of this attribute value
would fail to produce a well-formed element serialization.

If attribute value is null, then return the empty string.

Otherwise, attribute value is a string. Return the value of
attribute value, first replacing any occurrences of the following:

"&" with "&amp;"

""" with "&quot;"

"<" with "&lt;"

">" with "&gt;"

This matches behavior present in browsers, and goes above and beyond the grammar
requirement in the XML specification's AttValue production by also replacing
">" characters.

XML serializing a Comment node

If the require well-formed flag is set (its value is true), and
node's data contains characters that are not matched by the XML
Char production or contains "--" (two adjacent U+002D HYPHEN-MINUS
characters) or that ends with a "-" (U+002D HYPHEN-MINUS) character, then
throw an exception; the serialization of this node's data
would not be well-formed.

Otherwise, return the concatenation of "<!--", node's
data, and "-->".

If the require well-formed flag is true and the node's
systemId attribute contains characters that are not matched by the XML
Char production or that contains both a """ (U+0022 QUOTATION MARK) and a
"'" (U+0027 APOSTROPHE), then throw an exception; the serialization of this
node would not be a well-formed document type declaration.

Let markup be an empty string.

Append the string "<!DOCTYPE" to markup.

Append "" (U+0020 SPACE) to markup.

Append the value of the node's name
attribute to markup. For a node belonging to an HTML document, the
value will be all lowercase.

If the node's publicId is not the empty string then append
the following, in the order listed, to markup:

If the require well-formed flag is set (its value is true), and
node's data contains characters that are not matched by the XML
Char production or contains the string "?>" (U+003F QUESTION MARK,
U+003E GREATER-THAN SIGN), then throw an exception; the serialization of this
node's data would not be well-formed.

Let markup be the concatenation of the following, in the order listed:

Revision History

The following is an informative summary of the changes since the last publication of this
specification. A complete revision history of the Editor's Drafts of this specification can be
found at the
W3C Github Repository and older
revisions at the
W3C Mercurial server.

Acknowledgements

We acknowledge with gratitude the original work of Ms2ger and others at the WHATWG, who created
and maintained the original DOM Parsing and Serialization Living Standard upon which this
specification is based.