9 The XHTML syntax

This section only describes the rules for XML resources. Rules for
text/html resources are discussed in the section above entitled "The HTML
syntax".

9.1 Writing XHTML documents

The syntax for using HTML with XML, whether in XHTML documents or embedded in other XML
documents, is defined in the XML and Namespaces in XML specifications. [XML][XMLNS]

This specification does not define any syntax-level requirements beyond those defined for XML
proper.

XML documents may contain a DOCTYPE if desired, but this is not required to
conform to this specification. This specification does not define a public or system identifier,
nor provide a formal DTD.

According to the XML specification, XML processors are not guaranteed to process
the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XHTML documents
is unsafe if they are defined in an external file (except for &lt;,
&gt;, &amp;, &quot;
and &apos;).

9.2 Parsing XHTML documents

This section describes the relationship between XML and the DOM, with a particular emphasis on
how this interacts with HTML.

An XML parser, for the purposes of this specification, is a construct that follows
the rules given in the XML specification to map a string of bytes or characters into a
Document object.

At the time of writing, no such rules actually exist.

An XML parser is either associated with a Document object when it is
created, or creates one implicitly.

This Document must then be populated with DOM nodes that represent the tree
structure of the input passed to the parser, as defined by the XML specification, the Namespaces
in XML specification, and the DOM specification. DOM mutation events must not fire for the
operations that the XML parser performs on the Document's tree, but the
user agent must act as if elements and attributes were individually appended and set respectively
so as to trigger rules in this specification regarding what happens when an element is inserted
into a document or has its attributes set, and the DOM specification's requirements regarding
mutation observers mean that mutation observers are fired (unlike mutation events). [XML][XMLNS][DOM][DOMEVENTS]

Between the time an element's start tag is parsed and the time either the element's end tag is
parsed or the parser detects a well-formedness error, the user agent must act as if the element
was in a stack of open elements.

This is used by the object element to avoid instantiating plugins
before the param element children have been parsed.

Furthermore, user agents should attempt to retrieve the above external entity's content when
one of the above public identifiers is used, and should not attempt to retrieve any other external
entity's content.

This is not strictly a violation of the XML
specification, but it does contradict the spirit of the XML specification's requirements. This is
motivated by a desire for user agents to all handle entities in an interoperable fashion without
requiring any network access for handling external subsets. [XML]

Certain algorithms in this specification spoon-feed the
parser characters one string at a time. In such cases, the XML parser must act
as it would have if faced with a single string consisting of the concatenation of all those
characters.

In both cases, the string returned must be XML namespace-well-formed and must be an isomorphic
serialization of all of that node's relevant child nodes, in tree order.
User agents may adjust prefixes and namespace declarations in the serialization (and indeed might
be forced to do so in some cases to obtain namespace-well-formed XML). User agents may use a
combination of regular text and character references to represent Text nodes in the
DOM.

A node's relevant child nodes are those that apply given the following rules:

For Elements, if any of the elements in the serialization are in no namespace, the
default namespace in scope for those elements must be explicitly declared as the empty string. (This doesn't apply in the Document case.) [XML][XMLNS]

For the purposes of this section, an internal general parsed entity is considered XML
namespace-well-formed if a document consisting of an element with no namespace declarations whose
contents are the internal general parsed entity would itself be XML namespace-well-formed.

If any of the following error cases are found in the DOM subtree being serialized, then the
algorithm must throw an InvalidStateError exception instead of returning a
string:

A DocumentType node that has an external subset public identifier that contains
characters that are not matched by the XML PubidChar production. [XML]

A DocumentType node that has an external subset system identifier that contains
both a """ (U+0022) and a "'" (U+0027) or that contains characters that are
not matched by the XML Char production. [XML]

A node with a local name containing a ":" (U+003A).

A node with a local name that does not match the XML Name production. [XML]

An Attr node with no namespace whose local name is the lowercase string "xmlns". [XMLNS]

An Element node with two or more attributes with the same local name and
namespace.

These are the only ways to make a DOM unserializable. The DOM enforces all the
other XML constraints; for example, trying to append two elements to a Document node
will throw a HierarchyRequestError exception.

When the algorithm must produce a serialization of a template element, the string
returned must contain a serialization of the child nodes of the template element's contentDocumentFragment, rather than the
template element's children.

9.4 Parsing XHTML fragments

The XML fragment parsing algorithm either returns a Document or throws
a SyntaxError exception. Given a string input and an optional
context element context, the algorithm is as
follows:

If there is a context element, feed the
parser just created the string corresponding to the start tag of that element, declaring
all the namespace prefixes that are in scope on that element in the DOM, as well as declaring
the default namespace (if any) that is in scope on that element in the DOM.

A namespace prefix is in scope if the DOM lookupNamespaceURI() method
on the element would return a non-null value for that prefix.

The default namespace is the namespace for which the DOM isDefaultNamespace() method on the element would return true.

If there is a context element, no
DOCTYPE is passed to the parser, and therefore no external subset is
referenced, and therefore no entities will be recognized.