Each Document object has a reload override
flag that is originally unset. The flag is set by the document.open() and document.write() methods in certain
situations. When the flag is set, the Document also has
a reload override buffer which is a Unicode string that
is used as the source of the document when it is reloaded.

When the user agent is to perform an overridden
reload, it must act as follows:

In the case of HTTP, the referrer IDL attribute will
match the Referer (sic) header
that was sent when fetching the current
page.

Typically user agents are configured to not report
referrers in the case where the referrer uses an encrypted protocol
and the current page does not (e.g. when navigating from an https: page to an http:
page).

Since the cookie attribute is accessible
across frames, the path restrictions on cookies are only a tool to
help manage which cookies are sent to which parts of the site, and
are not in any way a security feature.

Returns the date of the last modification to the document, as
reported by the server, in the form "MM/DD/YYYY hh:mm:ss", in the user's local
time zone.

If the last modification date is not known, the current time is
returned instead.

The lastModified
attribute, on getting, must return the date and time of the
Document's source file's last modification, in the
user's local time zone, in the following format:

The month component of the date.

A "/" (U+002F) character.

The day component of the date.

A "/" (U+002F) character.

The year component of the date.

A U+0020 SPACE character.

The hours component of the time.

A ":" (U+003A) character.

The minutes component of the time.

A ":" (U+003A) character.

The seconds component of the time.

All the numeric components above, other than the year, must be given as two ASCII
digits representing the number in base ten, zero-padded if necessary. The year must be
given as the shortest possible string of four or more ASCII digits representing the
number in base ten, zero-padded if necessary.

The Document's source file's last modification date
and time must be derived from relevant features of the networking
protocols used, e.g. from the value of the HTTP Last-Modified header of the
document, or from metadata in the file system for local files. If
the last modification date and time are not known, the attribute
must return the current date and time in the above format.

Each document has a current document readiness. When a
Document object is created, it must have its
current document readiness set to the string "loading" if the document is associated with an
HTML parser, an XML parser, or an XSLT
processor, and to the string "complete"
otherwise. Various algorithms during page loading affect this value.
When the value is set, the user agent must fire a simple
event named readystatechange
at the Document object.

Can be set, to update the document's title. If there is no
head element,
the new value is ignored.

In SVG documents, the SVGDocument interface's
title attribute takes
precedence.

The title element of a document is the
first title element in the document (in tree order), if
there is one, or null otherwise.

The title attribute must,
on getting, run the following algorithm:

If the root element is an svg
element in the "http://www.w3.org/2000/svg"
namespace, and the user agent supports SVG, then return the value
that would have been returned by the IDL attribute of the same name
on the SVGDocument interface. [SVG]

On setting, the following algorithm must be run. Mutation events
must be fired as appropriate.

If the root element is an svg
element in the "http://www.w3.org/2000/svg"
namespace, and the user agent supports SVG, then the setter must
act as if it was the setter for the IDL attribute of the same name
on the Document interface defined by the SVG
specification. Stop the algorithm here. [SVG]

Otherwise, if the new value is the same as the body
element, do nothing. Abort these steps.

Otherwise, if the body element is not null, then
replace that element with the new value in the DOM, as if the root
element's replaceChild() method had been
called with the new value and the
incumbent body element as its two arguments respectively,
then abort these steps.

Returns a NodeList of elements in the
Document that have a name
attribute with the value name.

The getElementsByName(name) method takes a string name, and must return a liveNodeList containing all the HTML elements
in that document that have a name attribute
whose value is equal to the name argument (in a
case-sensitive manner), in tree order.
When the method is invoked on a Document object again
with the same argument, the user agent may return the same as the
object returned by the earlier call. In other cases, a new
NodeList object must be returned.

3.2 Elements

3.2.1 Semantics

Elements, attributes, and attribute values in HTML are defined
(by this specification) to have certain meanings (semantics). For
example, the ol element represents an ordered list, and
the lang attribute represents the
language of the content.

These definitions allow HTML processors, such as Web browsers or
search engines, to present and use documents and applications in a
wide variety of contexts that the author might not have
considered.

As a simple example, consider a Web page written by an author
who only considered desktop computer Web browsers. Because HTML
conveys meaning, rather than presentation, the same page
can also be used by a small browser on a mobile phone, without any
change to the page. Instead of headings being in large letters as
on the desktop, for example, the browser on the mobile phone might
use the same size text for the whole the page, but with the
headings in bold.

But it goes further than just differences in screen size: the
same page could equally be used by a blind user using a browser
based around speech synthesis, which instead of displaying the page
on a screen, reads the page to the user, e.g. using headphones.
Instead of large text for the headings, the speech browser might
use a different volume or a slower voice.

That's not all, either. Since the browsers know which parts of
the page are the headings, they can create a document outline that
the user can use to quickly navigate around the document, using
keys for "jump to next heading" or "jump to previous heading". Such
features are especially common with speech browsers, where users
would otherwise find quickly navigating a page quite difficult.

Even beyond browsers, software can make use of this information.
Search engines can use the headings to more effectively index a
page, or to provide quick links to subsections of the page from
their results. Tools can use the headings to create a table of
contents (that is in fact how this very specification's table of
contents is generated).

This example has focused on headings, but the same principle
applies to all of the semantics in HTML.

Authors must not use elements, attributes, or attribute values
for purposes other than their appropriate intended semantic purpose,
as doing so prevents software from correctly processing the
page.

For example, the following document is non-conforming, despite
being syntactically correct:

...because the data placed in the cells is clearly not tabular
data (and the cite element mis-used). This would make
software that relies on these semantics fail: for example, a speech
browser that allowed a blind user to navigate tables in the
document would report the quote above as a table, confusing the
user; similarly, a tool that extracted titles of works from pages
would extract "Ernest" as the title of a work, even though it's
actually a person's name, not a title.

This next document fragment, intended to represent the heading
of a corporate site, is similarly non-conforming because the second
line is not intended to be a heading of a subsection, but merely a
subheading or subtitle (a subordinate heading for the same
section).

<body>
<h1>ABC Company</h1>
<h2>Leading the way in widget design since 1432</h2>
...

Authors must not use elements, attributes, or attribute values
that are not permitted by this specification or other
applicable specifications, as doing so makes it significantly
harder for the language to be extended in the future.

In the next example, there is a non-conforming attribute value
("carpet") and a non-conforming attribute ("texture"), which
is not permitted by this specification:

Through scripting and using other mechanisms, the values of
attributes, text, and indeed the entire structure of the document
may change dynamically while a user agent is processing it. The
semantics of a document at an instant in time are those represented
by the state of the document at that instant in time, and the
semantics of a document can therefore change over time. User agents
must update their presentation of the
document as this occurs.

HTML has a progress element that
describes a progress bar. If its "value" attribute is dynamically
updated by a script, the UA would update the rendering to show the
progress changing.

3.2.2 Elements in the DOM

The nodes representing HTML elements in the DOM
must implement, and expose to scripts, the
interfaces listed for them in the relevant sections of this
specification. This includes HTML elements in XML
documents, even when those documents are in another context
(e.g. inside an XSLT transform).

Elements in the DOM represent
things; that is, they have intrinsic meaning, also known as
semantics.

The HTMLElement interface holds methods and
attributes related to a number of disparate features, and the
members of this interface are therefore described in various
different sections of this specification.

These attributes are only defined by this specification as
attributes for HTML elements. When this specification
refers to elements having these attributes, elements from namespaces
that are not defined as having these attributes must not be
considered as being elements with these attributes.

For example, in the following XML fragment, the "bogus" element does not have a dir attribute as defined in this
specification, despite having an attribute with the literal name
"dir". Thus, the directionality
of the inner-most span element is 'rtl', inherited from the
div element indirectly through the "bogus" element.

In HTML, the xmlns attribute
has absolutely no effect. It is basically a talisman. It is allowed
merely to make migration to and from XHTML mildly easier. When
parsed by an HTML parser, the attribute ends up in no
namespace, not the "http://www.w3.org/2000/xmlns/"
namespace like namespace declaration attributes in XML do.

In XML, an xmlns attribute is
part of the namespace declaration mechanism, and an element cannot
actually have an xmlns attribute in no
namespace specified.

The XML specification also allows the use of the xml:space attribute in the XML
namespace on any element in an XML document. This attribute has no effect on
HTML elements, as the default behavior in HTML is to
preserve whitespace. [XML]

To enable assistive technology products to expose a more
fine-grained interface than is otherwise possible with HTML elements
and attributes, a set of annotations for
assistive technology products can be specified (the ARIA
role and aria-* attributes).

3.2.3.1 The id attribute

The value must be unique amongst all the IDs in the element's home
subtree and must contain at least one character. The value
must not contain any space
characters.

An element's unique
identifier can be used for a variety of purposes, most
notably as a way to link to specific parts of a document using
fragment identifiers, as a way to target an element when scripting,
and as a way to style a specific element from CSS.

Identifiers are opaque strings. Particular meanings should not be
derived from the value of the id
attribute.

The title attribute
represents advisory information for the element, such
as would be appropriate for a tooltip. On a link, this could be the
title or a description of the target resource; on an image, it could
be the image credit or a description of the image; on a paragraph,
it could be a footnote or commentary on the text; on a citation, it
could be further information about the source; on interactive
content, it could be a label for, or instructions for, use of
the element; and so forth. The value is text.

Relying on the title
attribute is currently discouraged as many user agents do not expose
the attribute in an accessible manner as required by this
specification (e.g. requiring a pointing device such as a mouse to
cause a tooltip to apear, which excludes keyboard-only users and
touch-only users, such as anyone with a modern phone or tablet).

If this attribute is omitted from an element, then it implies
that the title attribute of the
nearest ancestor HTML element
with a title attribute set is also
relevant to this element. Setting the attribute overrides this,
explicitly stating that the advisory information of any ancestors is
not relevant to this element. Setting the attribute to the empty
string indicates that the element has no advisory information.

If the title attribute's value
contains "LF" (U+000A) characters, the content is split into
multiple lines. Each "LF" (U+000A) character represents a
line break.

Caution is advised with respect to the use of newlines in title attributes.

For instance, the following snippet actually defines an
abbreviation's expansion with a line break in it:

<p>My logs show that there was some interest in <abbr title="Hypertext
Transport Protocol">HTTP</abbr> today.</p>

Some elements, such as link, abbr, and
input, define additional semantics for the title attribute beyond the semantics
described above.

The advisory information of an element is the value
that the following algorithm returns, with the algorithm being
aborted once a value is returned. When the algorithm returns the
empty string, then there is no advisory information.

If the element is a link, style,
dfn, abbr, or title element,
then: if the element has a title attribute,
return the value of that attribute, otherwise, return the empty
string.

Otherwise, if the element has a title attribute, then return its
value.

Otherwise, if the element has a parent element, then return
the parent element's advisory information.

Otherwise, return the empty string.

User agents should inform the user when elements have
advisory information, otherwise the information would
not be discoverable.

The lang attribute (in
no namespace) specifies the primary language for the element's
contents and for any of the element's attributes that contain
text. Its value must be a valid BCP 47 language tag, or the empty
string. Setting the attribute to the empty string indicates that the
primary language is unknown. [BCP47]

The attribute in no namespace with no prefix and
with the literal localname "xml:lang" has no
effect on language processing.

To determine the language of a node, user agents must
look at the nearest ancestor element (including the element itself
if the node is an element) that has a lang attribute in the
XML namespace set or is an HTML element and has a lang in no namespace attribute set. That
attribute specifies the language of the node (regardless of its
value).

If neither the node nor any of the node's ancestors, including
the root element, have either attribute set, but there
is a pragma-set default language set, then that is the
language of the node. If there is no pragma-set default
language set, then language information from a higher-level
protocol (such as HTTP), if any, must be used as the final fallback
language instead. In the absence of any such language information,
and in cases where the higher-level protocol reports multiple
languages, the language of the node is unknown, and the
corresponding language tag is the empty string.

If the resulting value is not a recognized language tag, then it
must be treated as an unknown language having the given language
tag, distinct from all other languages. For the purposes of
round-tripping or communicating with other services that expect
language tags, user agents should pass unknown language tags
through unmodified.

Thus, for instance, an element with lang="xyzzy" would be matched by the selector :lang(xyzzy) (e.g. in CSS), but it would not be
matched by :lang(abcde), even though both are
equally invalid. Similarly, if a Web browser and screen reader
working in unison communicated about the language of the element,
the browser would tell the screen reader that the language was
"xyzzy", even if it knew it was invalid, just in case the screen
reader actually supported a language with that tag after all.

If the resulting value is the empty string, then it must be
interpreted as meaning that the language of the node is explicitly
unknown.

User agents may use the element's language to determine proper
processing or rendering (e.g. in the selection of appropriate fonts
or pronunciations, for dictionary selection, or for the user
interfaces of form controls such as date pickers).

The lang IDL attribute
must reflect the lang
content attribute in no namespace.

The translate
attribute is an enumerated attribute that is used to
specify whether an element's attribute values and the values of its
Text node children are to be translated when the page
is localized, or whether to leave them unchanged.

The attribute's keywords are the empty string, yes, and no. The empty string
and the yes keyword map to the yes
state. The no keyword maps to the no
state. In addition, there is a third state, the inherit
state, which is the missing value default (and the invalid
value default).

When an element is in the translate-enabled state, the
element's attribute values and the values of its Text
node children are to be translated when the page is localized.

When an element is in the no-translate state, the
element's attribute values and the values of its Text
node children are to be left as-is when the page is localized, e.g.
because the element contains a person's name or a the name of a
computer program.

The translate IDL
attribute must, on getting, return true if the element's
translation mode is translate-enabled, and
false otherwise. On setting, it must set the content attribute's
value to "yes" if the new value is true, and
set the content attribute's value to "no"
otherwise.

In this example, everything in the document is to be translated
when the page is localised, except the sample keyboard input and
sample program output:

<!DOCTYPE HTML>
<html> <!-- default on the root element is translate=yes -->
<head>
<title>The Bee Game</title> <!-- implied translate=yes inherited from ancestors -->
</head>
<body>
<p>The Bee Game is a text adventure game in English.</p>
<p>When the game launches, the first thing you should do is type
<kbd translate=no>eat honey</kbd>. The game will respond with:</p>
<pre><samp translate=no>Yum yum! That was some good honey!</samp></pre>
</body>
</html>

3.2.3.6 The dir attribute

The dir attribute specifies the
element's text directionality. The attribute is an enumerated
attribute with the following keywords and states:

The ltr keyword, which maps to the ltr state

Indicates that the contents of the element are explicitly
directionally embedded left-to-right text.

The rtl keyword, which maps to the rtl state

Indicates that the contents of the element are explicitly
directionally embedded right-to-left text.

The auto keyword, which maps to the auto state

Indicates that the contents of the element are explicitly
embedded text, but that the direction is to be determined
programmatically using the contents of the element (as described
below).

The heuristic used by this state is very crude (it
just looks at the first character with a strong directionality, in
a manner analogous to the Paragraph Level determination in the
bidirectional algorithm). Authors are urged to only use this value
as a last resort when the direction of the text is truly unknown
and no better server-side heuristic can be applied. [BIDI]

For textarea and pre
elements, the heuristic is applied on a per-paragraph level.

The attribute has no invalid value default and no
missing value default.

The directionality of an element (any element, not just an HTML element) is either 'ltr' or 'rtl', and is determined as per the first appropriate set of steps from
the following list:

If the element is a textarea element and the dir attribute is in the auto state

If the element's value
contains a character of bidirectional character type AL or R, and
there is no character of bidirectional character type L anywhere
before it in the element's value, then the
directionality of the element is 'rtl'. Otherwise, the
directionality of the element is 'ltr'. [BIDI]

Since the dir attribute is only defined for
HTML elements, it cannot be present on elements from other namespaces. Thus, elements
from other namespaces always just inherit their directionality from their parent element, or, if they don't have one,
default to 'ltr'.

The effect of this attribute is primarily on the presentation
layer. For example, the rendering section in this specification
defines a mapping from this attribute to the CSS 'direction' and
'unicode-bidi' properties, and CSS defines rendering in terms of
those properties.

Authors are strongly encouraged to use the dir attribute to indicate text direction
rather than using CSS, since that way their documents will continue
to render correctly even in the absence of CSS (e.g. as interpreted
by search engines).

Given a suitable style sheet and the default alignment styles
for the p element, namely to align the text to the
start edge of the paragraph, the resulting rendering could
be as follows:

As noted earlier, the auto
value is not a panacea. The final paragraph in this example is
misinterpreted as being right-to-left text, since it begins with an
Arabic character, which causes the "right?" to be to the left of
the Arabic text.

3.2.3.7 The class attribute

The attribute, if specified, must have a value that is a
set of space-separated tokens representing the various
classes that the element belongs to.

The classes that an HTML
element has assigned to it consists of all the classes
returned when the value of the class
attribute is split on
spaces. (Duplicates are ignored.)

Assigning classes to an element affects class
matching in selectors in CSS, the getElementsByClassName()
method in the DOM, and other such features.

There are no additional restrictions on the tokens authors can
use in the class attribute, but
authors are encouraged to use values that describe the nature of the
content, rather than values that describe the desired presentation
of the content.

3.2.3.8 The style attribute

In user agents that support CSS, the attribute's value must be
parsed when the attribute is added or has its value changed, according
to the rules given for CSS styling attributes. [CSSATTR]

Documents that use style
attributes on any of their elements must still be comprehensible and
usable if those attributes were removed.

In particular, using the style attribute to hide and show content,
or to convey meaning that is otherwise not included in the document,
is non-conforming. (To hide and show content, use the hidden attribute.)

The style IDL attribute
must return a CSSStyleDeclaration whose value
represents the declarations specified in the attribute. (If the
attribute is absent, the object represents an empty declaration.)
Mutating the CSSStyleDeclaration object must create a
style attribute on the element (if
there isn't one already) and then change its value to be a value
representing the serialized form of the
CSSStyleDeclaration object. The same object must be
returned each time. [CSSOM]

In the following example, the words that refer to colors are
marked up using the span element and the style attribute to make those words show
up in the relevant colors in visual media.

A custom data attribute is an attribute in no
namespace whose name starts with the string "data-", has at least one
character after the hyphen, is XML-compatible, and
contains no uppercase ASCII letters.

All attribute names on HTML elements in
HTML documents get ASCII-lowercased automatically, so
the restriction on ASCII uppercase letters doesn't affect such
documents.

Custom data attributes
are intended to store custom data private to the page or
application, for which there are no more appropriate attributes or
elements.

These attributes are not intended for use by software that is
independent of the site that uses the attributes.

For instance, a site about music could annotate list items
representing tracks in an album with custom data attributes
containing the length of each track. This information could then be
used by the site itself to allow the user to sort the list by track
length, or to filter the list for tracks of certain lengths.

<ol>
<li data-length="2m11s">Beyond The Sea</li>
...
</ol>

It would be inappropriate, however, for the user to use generic
software not associated with that music site to search for tracks
of a certain length by looking at this data.

This is because these attributes are intended for use by the
site's own scripts, and are not a generic extension mechanism for
publicly-usable metadata.

The dataset IDL
attribute provides convenient accessors for all the data-* attributes on an element. On
getting, the dataset IDL attribute
must return a DOMStringMap object, associated with the
following algorithms, which expose these attributes on their
element:

The algorithm for getting the list of name-value pairs

Let list be an empty list of name-value
pairs.

For each content attribute on the element whose first five
characters are the string "data-" and whose
remaining characters (if any) do not include any uppercase ASCII letters, add a name-value pair to list whose name is the attribute's name with the
first five characters removed and whose value is the attribute's
value.

Set the value of the attribute with the name name, to the value value,
replacing any previous value if the attribute already existed. If
setAttribute() would have thrown an
exception when setting an attribute with the name name, then this must throw the same
exception.

Notice how the hyphenated attribute name becomes camel-cased in
the API.

Authors should carefully design such extensions so that when the
attributes are ignored and any associated CSS dropped, the page is
still usable.

User agents must not derive any implementation behavior from
these attributes or values. Specifications intended for user agents
must not define these attributes to have any meaningful values.

JavaScript libraries may use the custom data attributes, as they are considered to
be part of the page on which they are used. Authors of libraries
that are reused by many authors are encouraged to include their name
in the attribute names, to reduce the risk of clashes. Where it
makes sense, library authors are also encouraged to make the exact
name used in the attribute names customizable, so that libraries
whose authors unknowingly picked the same name can be used on the
same page, and so that multiple versions of a particular library can
be used on the same page even when those versions are not mutually
compatible.

For example, a library called "DoQuery" could use attribute
names like data-doquery-range, and a library
called "jJo" could use attributes names like data-jjo-range. The jJo library could also provide
an API to set which prefix to use (e.g. J.setDataPrefix('j2'), making the attributes have
names like data-j2-range).

3.2.4 Element definitions

Each element in this specification has a definition that includes
the following information:

A normative description of what content must be included as
children and descendants of the element.

Content attributes

A normative list of attributes that may be specified on the
element (except where otherwise disallowed).

DOM interface

A normative definition of a DOM interface that such elements
must implement.

This is then followed by a description of what the element
represents, along with any additional normative
conformance criteria that may apply to authors and implementations. Examples are sometimes
also included.

3.2.4.1 Attributes

Except where otherwise specified, attributes
on HTML elements may have any string value, including
the empty string. Except where explicitly stated, there is no
restriction on what text can be specified in such attributes.

3.2.5 Content models

Each element defined in this specification has a content model: a
description of the element's expected contents. An HTML element must have contents that match the
requirements described in the element's content model.

The space characters are
always allowed between elements. User agents represent these
characters between elements in the source markup as
Text nodes in the DOM. Empty Text nodes and
Text nodes consisting of just sequences of those
characters are considered inter-element whitespace.

Inter-element whitespace, comment nodes, and
processing instruction nodes must be ignored when establishing
whether an element's contents match the element's content model or
not, and must be ignored when following algorithms that define
document and element semantics.

Thus, an element A is said to be
preceded or followed by a second element B if A and B
have the same parent node and there are no other element nodes or
Text nodes (other than inter-element
whitespace) between them. Similarly, a node is the only
child of an element if that element contains no other nodes
other than inter-element whitespace, comment nodes, and
processing instruction nodes.

Authors must not use HTML elements anywhere except
where they are explicitly allowed, as defined for each element, or
as explicitly required by other specifications. For XML compound
documents, these contexts could be inside elements from other
namespaces, if those elements are defined as providing the relevant
contexts.

For example, the Atom specification defines a content element. When its type attribute has the value xhtml, the Atom specification requires that it
contain a single HTML div element. Thus, a
div element is allowed in that context, even though
this is not explicitly normatively stated by this specification. [ATOM]

In addition, HTML elements may be orphan nodes
(i.e. without a parent node).

For example, creating a td element and storing it
in a global variable in a script is conforming, even though
td elements are otherwise only supposed to be used
inside tr elements.

var data = {
name: "Banana",
cell: document.createElement('td'),
};

3.2.5.1 Kinds of content

Each element in HTML falls into zero or more categories that group elements with similar
characteristics together. The following broad categories are used in
this specification:

Other categories are also used for specific purposes, e.g. form
controls are specified using a number of categories to define common
requirements. Some elements have unique requirements and do not fit
into any particular category.

3.2.5.1.1 Metadata content

Metadata content is content that sets up the
presentation or behavior of the rest of the content, or that sets
up the relationship of the document with other documents, or that
conveys other "out of band" information.

As a general rule, elements whose content model allows any
phrasing content should have either at least one
descendant Text node that is not inter-element
whitespace, or at least one descendant element node that is
embedded content. For the purposes of this requirement,
nodes that are descendants of del elements must not be
counted as contributing to the ancestors of the del
element.

Most elements that are categorized as phrasing
content can only contain elements that are themselves categorized as
phrasing content, not any flow content.

Text nodes and attribute values must consist of
Unicode characters, must not
contain U+0000 characters, must not contain permanently undefined
Unicode characters (noncharacters), and must not contain control
characters other than space
characters.
This specification includes extra constraints on the exact value of
Text nodes and attribute values depending on their
precise context.

3.2.5.1.6 Embedded content

Embedded content is content that imports another
resource into the document, or content from another vocabulary that
is inserted into the document.

Elements that are from namespaces other than the HTML
namespace and that convey content but not metadata, are
embedded content for the purposes of the content models
defined in this specification. (For example, MathML, or SVG.)

Some embedded content elements can have fallback
content: content that is to be used when the external resource
cannot be used (e.g. because it is of an unsupported format). The
element definitions state what the fallback is, if any.

3.2.5.1.7 Interactive content

Interactive content is content that is specifically
intended for user interaction.

Certain elements in HTML have an activation
behavior, which means that the user can activate them. This
triggers a sequence of events dependent on the activation mechanism,
and normally culminating in a click
event, as described below.

The user agent should allow the user to manually trigger elements
that have an activation behavior, for instance using
keyboard or voice input, or through mouse clicks. When the user
triggers an element with a defined activation behavior
in a manner other than clicking it, the default action of the
interaction event must be to run synthetic click activation
steps on the element.

Each element has a click in progress flag,
initially set to false.

When a user agent is to run synthetic click activation
steps on an element, the user agent must run the following
steps:

If the element's click in progress flag
is set to true, then abort these steps.

When a user agent is to run pre-click activation steps
on an element, it must run the pre-click activation steps
defined for that element, if any.

When a user agent is to run canceled activation steps
on an element, it must run the canceled activation steps
defined for that element, if any.

When a user agent is to run post-click activation
steps on an element, it must run the activation
behavior defined for that element, if any. Activation
behaviors can refer to the click
event that was fired by the steps above leading up to this
point.

3.2.5.1.8 Palpable content

As a general rule, elements whose content model allows any
flow content or phrasing content should
have at least one child node that is palpable content
and that does not have the hidden
attribute specified.

This requirement is not a hard requirement, however, as there are
many cases where an element can be empty legitimately, for example
when it is used as a placeholder which will later be filled in by a
script, or when the element is part of a template and would on most
pages be filled in but on some pages is not relevant.

Conformance checkers are encouraged to provide a mechanism for
authors to find elements that fail to fulfill this requirement, as
an authoring aid.

3.2.5.2 Transparent content models

Some elements are described as transparent; they have
"transparent" in the description of their content model. The content
model of a transparent element is derived from the
content model of its parent element: the elements required in the
part of the content model that is "transparent" are the same
elements as required in the part of the content model of the parent
of the transparent element in which the transparent element finds
itself.

In some cases, where transparent elements are nested
in each other, the process has to be applied iteratively.

Consider the following markup fragment:

<p><ins><map><a href="/">Apples</a></map></ins></p>

To check whether "Apples" is allowed inside the a
element, the content models are examined. The a
element's content model is transparent, as is the map
element's, as is the ins element's. The ins element is found in the
p element, whose content model is phrasing
content. Thus, "Apples" is allowed, as text is phrasing
content.

When a transparent element has no parent, then the part of its
content model that is "transparent" must instead be treated as
accepting any flow content.

3.2.5.3 Paragraphs

The term paragraph as defined in this
section is used for more than just the definition of the
p element. The paragraph concept defined
here is used to describe how to interpret documents. The
p element is merely one of several ways of marking up a
paragraph.

A paragraph is typically a run of phrasing
content that forms a block of text with one or more sentences
that discuss a particular topic, as in typography, but can also
be used for more general thematic grouping. For instance, an address
is also a paragraph, as is a part of a form, a byline, or a stanza
in a poem.

In the following example, there are two paragraphs in a
section. There is also a heading, which contains phrasing content
that is not a paragraph. Note how the comments and
inter-element whitespace do not form paragraphs.

<section>
<h1>Example of paragraphs</h1>
This is the <em>first</em> paragraph in this example.
<p>This is the second.</p>
<!-- This is not a paragraph. -->
</section>

Paragraphs in flow content are defined relative to
what the document looks like without the a,
ins, del, and map elements
complicating matters, since those elements, with their hybrid
content models, can straddle paragraph boundaries, as shown in the
first two examples below.

Generally, having elements straddle paragraph
boundaries is best avoided. Maintaining such markup can be
difficult.

The following example takes the markup from the earlier example
and puts ins and del elements around some
of the markup to show that the text was changed (though in this
case, the changes admittedly don't make much sense). Notice how
this example has exactly the same paragraphs as the previous one,
despite the ins and del elements —
the ins element straddles the heading and the first
paragraph, and the del element straddles the boundary
between the two paragraphs.

<section>
<ins><h1>Example of paragraphs</h1>
This is the <em>first</em> paragraph in</ins> this example<del>.
<p>This is the second.</p></del>
<!-- This is not a paragraph. -->
</section>

Let view be a view of the DOM that replaces
all a, ins, del, and
map elements in the document with their contents. Then,
in view, for each run of sibling phrasing
content nodes uninterrupted by other types of content, in an
element that accepts content other than phrasing
content as well as phrasing content, let first be the first node of the run, and let last be the last node of the run. For each such run
that consists of at least one node that is neither embedded
content nor inter-element whitespace, a
paragraph exists in the original DOM from immediately before first to immediately after last. (Paragraphs can thus span across
a, ins, del, and
map elements.)

Conformance checkers may warn authors of cases where they have
paragraphs that overlap each other (this can happen with
object, video, audio, and
canvas elements, and indirectly through elements in
other namespaces that allow HTML to be further embedded therein,
like svg or math).

It is possible for paragraphs to overlap when using certain
elements that define fallback content. For example, in the
following section:

<section>
<h1>My Cats</h1>
You can play with my cat simulator.
<object data="cats.sim">
To see the cat simulator, use one of the following links:
<ul>
<li><a href="cats.sim">Download simulator file</a>
<li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
</ul>
Alternatively, upgrade to the Mellblom Browser.
</object>
I'm quite proud of it.
</section>

There are five paragraphs:

The paragraph that says "You can play with my cat
simulator. object I'm quite proud of it.", where
object is the object element.

The paragraph that says "To see the cat simulator, use one of
the following links:".

The paragraph that says "Download simulator file".

The paragraph that says "Use online simulator".

The paragraph that says "Alternatively, upgrade to the Mellblom Browser.".

The first paragraph is overlapped by the other four. A user
agent that supports the "cats.sim" resource will only show the
first one, but a user agent that shows the fallback will
confusingly show the first sentence of the first paragraph as
if it was in the same paragraph as the second one, and will show
the last paragraph as if it was at the start of the second sentence
of the first paragraph.

To avoid this confusion, explicit p elements can be
used. For example:

Text content in HTML elements with
child Text nodes, and text in attributes of HTML
elements that allow free-form text, may contain characters in
the range U+202A to U+202E (the bidirectional-algorithm formatting
characters). However, the use of these characters is restricted so
that any embedding or overrides generated by these characters do not
start and end with different parent elements, and so that all such
embeddings and overrides are explicitly terminated by a U+202C POP
DIRECTIONAL FORMATTING character. This helps reduce incidences of
text being reused in a manner that has unforeseen effects on the
bidirectional algorithm. [BIDI]

Authors are encouraged to use the dir attribute, the bdo element,
and the bdi element, rather than maintaining the
bidirectional-algorithm formatting characters manually. The
bidirectional-algorithm formatting characters interact poorly with
CSS.

3.2.7 WAI-ARIA

Authors may use the ARIA role
and aria-* attributes on HTML
elements, in accordance with the requirements described in
the ARIA specifications, except where these conflict with the
strong native semantics
described below. These
exceptions are intended to prevent authors from making assistive
technology products report nonsensical states that do not represent
the actual state of the document. [ARIA]

The ARIA attributes defined in the ARIA
specifications, and the strong native semantics and
default implicit ARIA semantics defined below, do not
have any effect on CSS pseudo-class matching, user interface
modalities that don't use assistive technologies, or the default
actions of user interaction events as described in this
specification.

ARIA State and Property attributes can be used on any element. They
are not always meaningful, however, and in such cases user agents
might not perform any processing aside from including them in the DOM.
State and property attributes are processed according to the
requirements of the sections Strong Native Semantics and Implicit ARIA semantics, as
well as [ARIA] and [ARIAIMPL].

3.2.7.3 Strong Native Semantics

The following table defines the strong native
semantics and corresponding default implicit ARIA
semantics that apply to HTML elements. Each
language feature (element or attribute) in a cell in the first
column implies the ARIA semantics (role, states, and/or properties)
given in the cell in the second column of the same row. When multiple rows apply to an element, the role from
the last row to define a role must be applied, and the states and
properties from all the rows must be combined.

spinbutton role, with the aria-readonly property set to "true" if the element has a readonly attribute, the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and, if the result of applying the rules for parsing floating-point number values to the element's value is a number, with the aria-valuenow property set to that number

progressbar role, with, if the progress bar is determinate, the aria-valuemax property set to the maximum value of the progress bar, the aria-valuemin property set to zero, and the aria-valuenow property set to the current value of the progress bar

3.2.7.4 Implicit ARIA Semantics

Some HTML elements have native semantics that can be
overridden. The following table lists these elements and their
default implicit ARIA semantics, along with the
restrictions that apply to those elements. Each language feature
(element or attribute) in a cell in the first column implies, unless
otherwise overridden, the ARIA semantic (role, state, or property)
given in the cell in the second column of the same row, but this
semantic may be overridden under the conditions listed in the cell
in the third column of that row. In addition, any element may be
given the presentation role,
regardless of the restrictions below.

The entry "no role", when
used as a strong native
semantic, means that no role other than presentation can be used.
When used as a default
implicit ARIA semantic, it means the user agent has no default
mapping to ARIA roles. (However, it probably will have its own
mappings to the accessibility layer.)

The WAI-ARIA specification neither requires or forbids user
agents from enhancing native presentation and interaction behaviors
on the basis of WAI- ARIA markup. Even mainstream user agents might
choose to expose metadata or navigational features directly or via
user-installed extensions; for example, exposing required form
fields or landmark navigation. User agents are encouraged to
maximize their usefulness to users, including users without
disabilities.

Conformance checkers are encouraged to phrase errors such that
authors are encouraged to use more appropriate elements rather than
remove accessibility annotations. For example, if an a
element is marked as having the button role, a conformance
checker could say "Use a more appropriate element to represent a
button, for example a button element or an
input element" rather than "The button role cannot be used with
a elements".

These features can be used to make accessibility tools render
content to their users in more useful ways. For example, ASCII art,
which is really an image, appears to be text, and in the absence of
appropriate annotations would end up being rendered by screen
readers as a very painful reading of lots of punctuation. Using the
features described in this section, one can instead make the ATs
skip the ASCII art and just read the caption:

3.3 Interactions with XPath and XSLT

Implementations of XPath 1.0 that
operate on HTML documents parsed or created in the
manners described in this specification (e.g. as part of the document.evaluate() API) must act as if the
following edit was applied to the XPath 1.0 specification.

First, remove this paragraph:

A QName in the
node test is expanded into an expanded-name
using the namespace declarations from the expression context. This
is the same way expansion is done for element type names in start
and end-tags except that the default namespace declared with xmlns is not used: if the QName does
not have a prefix, then the namespace URI is null (this is the same
way attribute names are expanded). It is an error if the QName has a
prefix for which there is no namespace declaration in the
expression context.

Then, insert in its place the following:

A QName in the node test is expanded into an expanded-name using
the namespace declarations from the expression context. If the
QName has a prefix, then there must be a namespace
declaration for this prefix in the expression context, and the
corresponding namespace
URI is the one that is associated with this prefix. It is an error
if the QName has a prefix for which there is no namespace
declaration in the expression context.

If the QName has no prefix and the principal node type of the
axis is element, then the default element namespace is
used. Otherwise if the QName has no prefix, the namespace URI is
null. The default element namespace is a member of the context for
the XPath expression. The value of the default element namespace
when executing an XPath expression through the DOM3 XPath API is
determined in the following way:

If the context node is from an HTML DOM, the default element
namespace is "http://www.w3.org/1999/xhtml".

Otherwise, the default element namespace URI is null.

This is equivalent to adding the default element
namespace feature of XPath 2.0 to XPath 1.0, and using the HTML
namespace as the default element namespace for HTML documents. It
is motivated by the desire to have implementations be compatible
with legacy HTML content while still supporting the changes that
this specification introduces to HTML regarding the namespace used
for HTML elements, and by the desire to use XPath 1.0 rather than
XPath 2.0.

This change is a willful violation of
the XPath 1.0 specification, motivated by desire to have
implementations be compatible with legacy content while still
supporting the changes that this specification introduces to HTML
regarding which namespace is used for HTML elements. [XPATH10]

XSLT 1.0 processors outputting
to a DOM when the output method is "html" (either explicitly or via
the defaulting rule in XSLT 1.0) are affected as follows:

If the transformation program outputs an element in no namespace,
the processor must, prior to constructing the corresponding DOM
element node, change the namespace of the element to the HTML
namespace, ASCII-lowercase the element's local name, and
ASCII-lowercase
the names of any non-namespaced attributes on the element.

This requirement is a willful violation
of the XSLT 1.0 specification, required because this specification
changes the namespaces and case-sensitivity rules of HTML in a
manner that would otherwise be incompatible with DOM-based XSLT
transformations. (Processors that serialize the output are
unaffected.) [XSLT10]

This specification does not specify precisely how XSLT processing
interacts with the HTML parser infrastructure (for
example, whether an XSLT processor acts as if it puts any elements
into a stack of open elements). However, XSLT
processors must stop parsing if they successfully
complete, and must set the current document readiness
first to "interactive" and then to "complete" if they are aborted.

This specification does not specify how XSLT interacts with the
navigation algorithm, how it fits in
with the event loop, nor how error pages are to be
handled (e.g. whether XSLT errors are to replace an incremental XSLT
output, or are rendered inline, etc).

3.4.1 Opening the input stream

Causes the Document to be replaced in-place, as if
it was a new Document object, but reusing the
previous object, which is then returned.

If the type argument is omitted or has the
value "text/html", then the resulting
Document has an HTML parser associated with it, which
can be given data to parse using document.write(). Otherwise, all
content passed to document.write() will be parsed
as plain text.

If the replace argument is present and has
the value "replace", the existing entries in
the session history for the Document object are
removed.

Document objects have an
ignore-opens-during-unload counter, which is used to
prevent scripts from invoking the document.open() method (directly or
indirectly) while the document is being unloaded. Initially, the counter must be set
to zero.

When called with two or fewer arguments, the document.open() method must act as
follows:

This basically causes document.open() to be ignored
when it's called in an inline script found during the parsing of
data sent over the network, while still letting it have an effect
when called asynchronously or on a document that is itself being
spoon-fed using these APIs.

Remove all child nodes of the document, without firing any
mutation events.

Replace the Document's singleton objects with
new instances of those objects. (This includes in particular the
Window, Location, History,
ApplicationCache, and Navigator, objects,
the various BarProp objects, the two
Storage objects, the various
HTMLCollection objects, and objects defined by other
specifications, like Selection and the document's
UndoManager. It also includes all the Web IDL
prototypes in the JavaScript binding, including the
Document object's prototype.)

If replace is false, then add a new
entry, just before the last entry, and associate with the new entry
the text that was parsed by the previous parser associated with the
Document object, as well as the state of the document
at the start of these steps. This allows the user to step backwards
in the session history to see the page before it was blown away by
the document.open() call.
This new entry does not have a Document object, so a
new one will be created if the session history is traversed to that
entry.

When called with three or more arguments, the open() method on the
Document object must call the open() method on the Window
object of the Document object, with the same
arguments as the original call to the open() method, and return whatever
that method returned. If the Document object has no
Window object, then the method must throw an
InvalidAccessError exception.

This method has very idiosyncratic behavior. In
some cases, this method can affect the state of the HTML
parser while the parser is running, resulting in a DOM that
does not correspond to the source of the document (e.g. if the
string written is the string "<plaintext>" or "<!--"). In other cases, the call can clear the
current page first, as if document.open() had been called.
In yet more cases, the method is simply ignored, or throws an
exception. To make matters worse, the exact behavior of this
method can in some cases be dependent on network latency, which can lead to failures that are very hard to debug.
For all these reasons, use of this method is strongly
discouraged.

Document objects have an
ignore-destructive-writes counter, which is used in
conjunction with the processing of script elements to
prevent external scripts from being able to use document.write() to blow away the
document by implicitly calling document.open(). Initially, the
counter must be set to zero.

If there is no pending parsing-blocking script,
have the HTML parser process the characters that were
inserted, one at a time, processing resulting tokens as they are
emitted, and stopping when the tokenizer reaches the insertion
point or when the processing of the tokenizer is aborted by the
tree construction stage (this can happen if a script
end tag token is emitted by the tokenizer).

The document.writeln(...)
method, when invoked, must act as if the document.write() method had been
invoked with the same argument(s), plus an extra argument consisting
of a string containing a single line feed character (U+000A).