5. User interaction

5.1. The hidden attribute

All html elements may have the hidden content attribute set. The hidden attribute is a boolean attribute. When specified on an element, it
indicates that the element is not yet, or is no longer, directly relevant to the page’s current
state, or that it is being used to declare content to be reused by other parts of the page as
opposed to being directly accessed by the user. User agents should not render
elements that have the hidden attribute specified. This requirement may be
implemented indirectly through the style layer. For example, an HTML+CSS user agent could
implement these requirements using the rules suggested in §10 Rendering.

Because this attribute is typically implemented using CSS, it’s also possible to override it
using CSS. For instance, a rule that applies 'display: block' to all elements will cancel the
effects of the hidden attribute. Authors therefore have to take care when writing
their style sheets to make sure that the attribute is still styled as expected.

In the following skeletal example, the attribute is used to hide the Web game’s main screen
until the user logs in:

The hidden attribute must not be used to hide content just from one presentation —
if something is marked hidden, it is hidden from all presentations, including, for instance, screen readers.

Elements that are not themselves hidden must not hyperlink to elements that are hidden. The for attributes of label and output elements that are not
themselves hidden must similarly not refer to elements that are hidden. In both cases, such references would cause user confusion.

Elements and scripts may, however, refer to elements that are hidden in other contexts.

For example, it would be incorrect to use the href attribute to link to a
section marked with the hidden attribute. If the content is not applicable or relevant, then there
is no reason to link to it.

It would be fine, however, to use the ARIA aria-describedby attribute to
refer to descriptions that are themselves hidden. While hiding the descriptions
implies that they are not useful alone, they could be written in
such a way that they are useful in the specific context of being
referenced from the images that they describe.

Similarly, a canvas element with the hidden attribute could be used by a
scripted graphics engine as an off-screen buffer, and a form
control could refer to a hidden form element using its form attribute.

Accessibility APIs are encouraged to provide a way to expose
structured content while marking it as hidden in the default view.
Such content should not be perceivable to users in the normal document
flow in any modality, whether using Assistive Technology (AT) or
mainstream User Agents.

When such features are available, User Agents may use them to
expose the full semantics of hidden elements to AT when appropriate, if such content is referenced
indirectly by an ID reference or valid hash-name reference. This allows ATs to access the
structure of these hidden elements
upon user request, while keeping the content hidden in all
presentations of the normal document flow. Authors who wish to prevent
user-initiated viewing of a hidden element should not reference the element with such a mechanism.

Because some User Agents have flattened hidden content when
exposing such content to AT, authors should not reference hidden content which would lose essential
meaning when flattened.

For example, it would be incorrect to use the href attribute to link to a section marked with the hidden attribute.
If the content is not applicable or relevant, then there is no reason to link to it.

It would be fine, however, to use the ARIA aria-describedby attribute to refer to descriptions that are
themselves hidden. While hiding the descriptions implies that
they are not useful alone, they could be written in such a way that they are useful in the
specific context of being referenced from the images that they describe.

Similarly, a canvas element with the hidden attribute could be used by a scripted graphics engine as an off-screen buffer, and a form control
could refer to a hidden form element using its form attribute.

Elements in a section hidden by the hidden attribute are still
active, e.g., scripts and form controls in such sections still execute and submit respectively.
Only their presentation to the user changes.

The hidden IDL attribute must reflect the
content attribute of the same name.

5.2. Inert subtrees

This section does not define or create any content attribute
named "inert". This section merely defines an abstract concept of inertness.

A node (in particular elements and text nodes) can be marked as inert. When a node
is inert, then the user agent must act as if the node was absent for the purposes of
targeting user interaction events, may ignore the node for the purposes of text search user
interfaces (commonly known as "find in page"), and may prevent the user from selecting text in
that node. User agents should allow the user to override the restrictions on search and text
selection, however.

For example, consider a page that consists of just a single inert paragraph positioned in the middle of a body. If a user moves their pointing device
from the body over to the inert paragraph and clicks on the paragraph,
no mouseover event would be fired, and the mousemove and click events would
be fired on the body element rather than the paragraph.

When a node is inert, it generally cannot be focused. Inert nodes that are commands will also get disabled.

An entire Document can be marked as blocked by a modal dialogsubject. While a Document is so marked, every node that is in the Document, with the exception of the subject element and its
descendants, must be marked inert. (The elements excepted by this paragraph can
additionally be marked inert through other means; being part of a modal dialog does not
"protect" a node from being marked inert.)

5.3. Activation

Certain elements in HTML have an activation behavior, which means that the user
can activate them. This triggers a sequence of events dependent on the activation mechanism, and
normally culminating in a click event, as described below.

The user agent should allow the user to manually trigger elements that have an activation
behavior, for instance using keyboard or voice input, or through mouse clicks. When the
user triggers an element with a defined activation behavior in a manner other than
clicking it, the default action of the interaction event must be to run synthetic click
activation steps on the element.

Each element has a click in progress flag, initially set to false.

When a user agent is to run synthetic click activation steps on an element, the user
agent must run the following steps:

If the element’s click in progress flag is set to true, then abort
these steps.

When a pointing device is clicked, the user agent must run authentic click activation
steps instead of firing the click event. When a user agent is to run authentic click activation steps for a given event event, it must
follow these steps:

Let target be the element designated by the user (the target of event).

If target is a canvas element, run the canvas MouseEvent rerouting steps. If this changes event’s
target, then let target be the new target.

The algorithms above don’t run for arbitrary synthetic events dispatched by author script. The click() method can be used to make the run synthetic click activation steps algorithm happen programmatically.

Click-focusing behavior (e.g., the focusing of a text field when user clicks in one) typically
happens before the click, when the mouse button is first depressed, and is therefore not
discussed here.

Given an element target, the nearest activatable element is the element
returned by the following algorithm:

When a user agent is to run pre-click activation steps on an element, it must run
the pre-click activation steps defined for that element, if any.

When a user agent is to run canceled activation steps on an element, it must run the canceled activation steps defined for that element, if any.

When a user agent is to run post-click activation steps on an element, it must run
the activation behavior defined for that element, if any. Activation behaviors can
refer to the click event that was fired by the steps above
leading up to this point.

5.4. Focus

5.4.1. Introduction

This section is non-normative.

An HTML user interface typically consists of multiple interactive widgets, such as form
controls, scrollable regions, links, dialog boxes, browser tabs, and so forth. These widgets form
a hierarchy, with some (e.g., browser tabs) containing others (e.g., links, form
controls).

When interacting with an interface using a keyboard, key input is channeled from the system,
through the hierarchy of interactive widgets, to an active widget, which is said to be focused.

Consider an HTML application running in a browser tab running in a graphical environment.
Suppose this application had a page with some text fields and links, and was currently showing a
modal dialog, which itself had a text field and a button.

The hierarchy of focusable widgets, in this scenario, would include the browser window, which
would have, amongst its children, the browser tab containing the HTML application. The tab itself
would have as its children the various links and text fields, as well as the dialog. The dialog
itself would have as its children the text field and the button.

If the widget with focus in this example was the text field in the dialog box, then key
input would be channeled from the graphical system to ① the Web browser, then to ②
the tab, then to ③ the dialog, and finally to ④ the text field.

5.4.2. Data model

The term focusable area is used to refer to regions of the interface that can become
the target of keyboard input. Focusable areas can be elements, parts of elements, or other regions
managed by the user agent.

The following table describes what objects can be focusable
areas. The cells in the left column describe objects that can be focusable areas; the cells in the right column describe the DOM
anchors for those elements. (The cells that span both columns are non-normative examples.)

In the following example, the area element creates two shapes, one on each
image. The DOM anchor of the first shape is the first img element, and the DOM anchor of the second shape is the second img element.

<mapid=wallmap><areaalt="Enter Door"coords="10,10,100,200"href="door.html"></map>
...
<imgsrc="images/innerwall.jpeg"alt="There is a white wall here, with a door."usemap="#wallmap">
...
<imgsrc="images/outerwall.jpeg"alt="There is a red wall here, with a door."usemap="#wallmap">

5.4.3. The tabindex attribute

The tabindex content attribute allows authors to
indicate that an element is supposed to be focusable, and
whether it is supposed to be reachable using sequential focus navigation and, if so,
what is to be the relative order of the element for the purposes of sequential focus navigation.
The name "tab index" comes from the common use of the "tab" key to navigate through the focusable
elements. The term "tabbing" refers to moving forward through the focusable elements that can be
reached using sequential focus navigation.

When the attribute is omitted, the user agent applies defaults. (There is no way to make an
element that is being rendered be not focusable at all without disabling it or making it inert.)

Each element can have a tabindex focus flag set, as defined
below. This flag is a factor that contributes towards determining whether an element is a focusable area, as described in the previous section.

If the tabindex attribute is specified on an element, it
must be parsed using the rules for parsing integers. The attribute’s values, or lack
thereof, must be interpreted as follows:

One valid reason to ignore the platform conventions and always allow an element
to be focused (by setting its tabindex focus flag) would be if the user’s only
mechanism for activating an element is through a keyboard action that triggers the focused
element.

One valid reason to ignore the requirement that sequential focus navigation not
allow the author to lead to the element would be if the user’s only mechanism for moving the
focus is sequential focus navigation. For instance, a keyboard-only user would be unable to
click on a text field with a negative tabindex, so that
user’s user agent would be well justified in allowing the user to tab to the control
regardless.

before any focusable area whose DOM anchor is an element whose tabindex attribute has been omitted or whose value, when parsed,
returns an error,

before any focusable area whose DOM anchor is an element whose tabindex attribute has a value equal to or less than zero,

after any focusable area whose DOM anchor is an element whose tabindex attribute has a value greater than zero but less than
the value of the tabindex attribute on candidate,

after any focusable area whose DOM anchor is an element whose tabindex attribute has a value equal to the value of the tabindex attribute on candidate but that is
earlier in the document in tree order than candidate,

before any focusable area whose DOM anchor is an element whose tabindex attribute has a value equal to the value of the tabindex attribute on candidate but that is
later in the document in tree order than candidate, and

before any focusable area whose DOM anchor is an element whose tabindex attribute has a value greater than the value of the tabindex attribute on candidate.

This means that an element that is only focusable because of its tabindex attribute will fire a click event in response to a non-mouse activation (e.g., hitting the
"enter" key while the element is focused).

The tabIndex IDL attribute must reflect the value of the tabindex content
attribute. Its default value is 0 for elements that are focusable and -1 for elements that
are not focusable.

Most current browsers instead give the tabIndex IDL attribute a value of 0 for some list of elements that are by default a focusable area, and -1 for other elements, if there is no tabindex content attribute set. This behaviour is not well-defined and will hopefully be improved in the future.

If the last entry in old chain and the last entry in new chain are the same, pop the last entry from old chain and the last entry from new chain and redo this step.

For each entry entry in old chain, in order, run
these substeps:

If entry is an input element, and the change event applies to the element, and the element does not have a
defined activation behavior, and the user has changed the element’s value or its list of selected files while the control was focused
without committing that change, then fire a simple event that bubbles named change at the element.

If entry is an element, let blur event target be entry.

If entry is a Document object, let blur
event target be that Document object’s Window object.

Otherwise, let blur event target be null.

If entry is the last entry in old chain, and entry is an Element, and the last entry in new
chain is also an Element, then let related blur target be the last entry in new chain. Otherwise, let related blur
target be null.

If blur event target is not null, fire a focus event named blur at blur event target, with related blur target as the related target.

In some cases, e.g., if entry is an area element’s shape, a scrollable region, or a viewport, no event is fired.

Apply any relevant platform-specific conventions for focusing new focus
target. (For example, some platforms select the contents of a text field when that field is
focused.)

For each entry entry in new chain, in reverse
order, run these substeps:

If entry is a Document object, let focus
event target be that Document object’s Window object.

Otherwise, let focus event target be null.

If entry is the last entry in new chain, and entry is an Element, and the last entry in old
chain is also an Element, then let related focus target be the last entry in old chain. Otherwise, let related
focus target be null.

If focus event target is not null, fire a focus event named focus at focus event target, with related focus target as the related target.

In some cases, e.g., if entry is an area element’s shape, a scrollable region, or a viewport, no event is fired.

When a user agent is required to fire a focus event named e at
an element t and with a given related target r, the user
agent must create a trustedFocusEvent object, initialize it to have the given name e, to not bubble, to not be
cancelable, and to have the relatedTarget attribute initialized to r, the view attribute initialized to the Window object of the Document object of t, and the detail attribute initialized to 0, and must then dispatch the newly created FocusEvent object
at the specified target element t.

For example, if direction is backward, then the last
focusable control before the browser’s rendering area would be the control to focus.

If the user agent has no focusable controls — a kiosk-mode browser, for instance
—
then the user agent may instead restart these steps with the starting point being the top-level browsing context itself.

Returns true if key events are being routed through or to the document; otherwise, returns
false. Roughly speaking, this corresponds to the document, or a document nested inside this
one, being focused.

Moves the focus to the viewport. Use of this method is discouraged; if you want to focus the viewport, call the focus() method on the Document’s root element.

Do not use this method to hide the focus ring if you find the focus ring unsightly. Instead,
use a CSS rule to override the outline property, and provide a different way to show what
element is focused. Be aware that if an alternative focusing style isn’t made available, the
page will be significantly less usable for people who primarily navigate pages using a keyboard,
or those with reduced vision who use focus outlines to help them navigate the page.

For example, to hide the outline from links and instead use a yellow background to indicate
focus, you could use:

Do not use this method to hide the focus ring. Do not use any
other method that hides the focus ring from keyboard users, in
particular do not use a CSS rule to override the outline property. Removal of the focus ring leads to serious accessibility
issues for users who navigate and interact with interactive
content using the keyboard.

The activeElement attribute on Document objects must return the value returned by the following steps:

The blur() method on the Window object, when invoked, provides a hint to the user agent that the script believes the user probably
is not currently interested in the contents of the browsing context of the Window object on which the method was invoked, but that the contents might become
interesting again in the future.

User agents are encouraged to ignore calls to this blur() method entirely.

Historically, the focus() and blur() methods actually affected the system-level focus of the
system widget (e.g., tab or window) that contained the browsing context, but hostile
sites widely abuse this behavior to the user’s detriment.

The focus() method on elements, when invoked, must
run the following algorithm:

The blur() method, when invoked, should run the unfocusing steps for the element on which the method was called. User agents may
selectively or uniformly ignore calls to this method for usability reasons.

For example, if the blur() method is unwisely
being used to remove the focus ring for aesthetics reasons, the page would become unusable by
keyboard users. Ignoring calls to this method would thus allow keyboard users to interact with the
page.

5.5. Assigning keyboard shortcuts

5.5.1. Introduction

This section is non-normative.

Each element that can be activated or focused can be assigned a shortcut key combination to
activate it, using the accesskey attribute.

The exact shortcut is determined by the user agent, potentially using information about the user’s
preferences, what keyboard shortcuts already exist on the platform, and what other shortcuts have
been specified on the page, as well as the value of the accesskey attribute.

A valid value for accesskey consists of a single character, such as a letter or digit.

User agents can provide users with a list of the keyboard shortcuts, but authors are encouraged
to do so also.

In this example, an author has provided a button that can be invoked using a shortcut key,
and suggested "C" as a memorable and useful shortcut.

<inputtype=buttonvalue=Collectonclick="collect()"accesskey="C"id=c>

5.5.2. The accesskey attribute

All html elements may have the accesskey content attribute set. The accesskey attribute’s value is used
by the user agent as a guide for creating a keyboard shortcut that activates or focuses the
element.

If specified, the value must be a single printable character: a string exactly one Unicode code point in length.

Authors should not use " ", nor characters that normally require a modifier key to
generate, as a value of accesskey.

In the following example, a variety of links are given with access keys so that keyboard users
familiar with the site can more quickly navigate to the relevant pages:

5.5.3. Processing model

An element’s assigned access key is a key combination derived from the element’s accesskey content attribute, or assigned by the user agent, optionally
based on a user preference. Initially, an element must not have an assigned access key.

Whenever an element’s accesskey attribute is set, changed,
or removed, the user agent must update the element’s assigned access key by running
the following steps:

If the element has no accesskey attribute, then skip
to the fallback step below.

The user agent may assign a key combination based on stored user preferences as the
element’s assigned access key and then abort these steps.

Once a user agent has selected and assigned an access key for an element, the user agent should
not change the element’s assigned access key unless the accesskey content
attribute is changed or the element is moved to another Document.

When the user presses the key combination corresponding to the assigned access key for an element, if the element defines a command, the
command’s Hidden State facet is false (visible),
the command’s Disabled State facet is also false
(enabled), the element is in a Document that has an associated browsing context, and neither the element nor any of its ancestors has a hidden attribute specified, then the user agent must trigger the Action of the command.

User agents might expose elements that have
an accesskey attribute in other ways as well, e.g., in a menu
displayed in response to a specific key combination.

The contenteditable content attribute is an enumerated attribute whose keywords are the empty string, true,
and false. The empty string and the true keyword map
to the true state. The false keyword maps to the false state.
In addition, there is a third state, the inherit state, which is the missing value default (and the invalid value default).

The true state indicates that the element is editable. The inherit state
indicates that the element is editable if its parent is. The false state indicates that the
element is not editable.

The contentEditable IDL attribute, on
getting, must return the string "true" if the content attribute is set to
the true state, "false" if the content attribute is set to the false state,
and "inherit" otherwise. On setting, if the new value is an ASCII
case-insensitive match for the string "inherit" then the content
attribute must be removed, if the new value is an ASCII case-insensitive match for
the string "true" then the content attribute must be set to the string
"true", if the new value is an ASCII case-insensitive match for
the string "false" then the content attribute must be set to the string
"false", and otherwise the attribute setter must throw a
"SyntaxError" DOMException.

The isContentEditable IDL attribute, on
getting, must return true if the element is either an editing host or editable, and false otherwise.

5.6.2. Making entire documents editable: The designMode IDL attribute

Documents have a designMode, which can be either enabled or
disabled.

document . designMode [ = value ]

Returns "on" if the document is editable, and "off" if it isn’t.

Can be set, to change the document’s current state. This focuses the document and resets the
selection in that document.

The designMode IDL attribute on the Document object takes two values, "on" and "off". On setting, the new value must be compared in an ASCII
case-insensitive manner to these two values; if it matches the "on"
value, then designMode must be enabled, and if it
matches the "off" value, then designMode must be disabled. Other values must be
ignored.

On getting, if designMode is enabled, the IDL
attribute must return the value "on"; otherwise it is disabled, and the
attribute must return the value "off".

The last state set must persist until the document is destroyed or the state is changed.
Initially, documents must have their designMode disabled.

When the designMode changes from being disabled to
being enabled, the user agent must immediately reset the document’s active range’s
start and end boundary points to be at the start of the Document and then run the focusing steps for the root element of the Document, if any.

5.6.3. Best practices for in-page editors

Authors are encouraged to set the white-space property on editing
hosts and on markup that was originally created through these editing mechanisms to the
value pre-wrap. Default HTML whitespace handling is not well suited to WYSIWYG editing, and line
wrapping will not work correctly in some corner cases if white-space is left at its default
value.

As an example of problems that occur if the default normal value is used instead, consider
the case of the user typing "yellow␣␣ball", with two spaces (here
represented by "␣") between the words. With the editing rules in place for the default
value of white-space (normal), the resulting markup will either consist of
"yellow&nbsp; ball" or "yellow &nbsp;ball"; i.e.,
there will be a non-breaking space between the two words in addition to the regular space. This
is necessary because the normal value for white-space requires adjacent regular spaces to be
collapsed together.

In the former case, "yellow⍽" might wrap to the next line ("⍽"
being used here to represent a non-breaking space) even though "yellow" alone might
fit at the end of the line; in the latter case, "⍽ball", if wrapped to the
start of the line, would have visible indentation from the non-breaking space.

When white-space is set to pre-wrap, however, the editing rules will instead simply put
two regular spaces between the words, and should the two words be split at the end of a line, the
spaces would be neatly removed from the rendering.

5.6.4. Editing APIs

The definition of the terms active range, editing host, and editable, the user interface requirements of elements that are editing hosts or editable, the execCommand(), queryCommandEnabled(), queryCommandIndeterm(), queryCommandState(), queryCommandSupported(), and queryCommandValue() methods, text selections, and the delete the selection algorithm are being specified in the various developing
HTML Editing specification drafts [EDITING]. The interaction of editing and undo/redo
features are being specified in the UndoManager and DOM Transaction specification. [UNDO]

5.6.5. Spelling and grammar checking

User agents can support the checking of spelling and grammar of editable text, either in form
controls (such as the value of textarea elements), or in elements in an editing
host (e.g., using contenteditable).

For each element, user agents must establish a default behavior, either through
defaults or through preferences expressed by the user. There are
three possible default behaviors for each element:

true-by-default

The element will be checked for spelling and grammar if its contents are editable and
spellchecking is not explicitly disabled through the spellcheck attribute.

false-by-default

The element will never be checked for spelling and grammar unless spellchecking is
explicitly enabled through the spellcheck attribute.

inherit-by-default

The element’s default behavior is the same as its parent element’s. Elements that have no
parent element cannot have this as their default behavior.

The spellcheck attribute is an enumerated
attribute whose keywords are the empty string, true and false. The empty string and the true keyword map to the true state. The false keyword maps to the false state. In
addition, there is a third state, the default state, which is the missing value default (and the invalid value default).

The true state indicates that the element is to have its spelling and
grammar checked. The default state indicates that the element is to act according to a
default behavior, possibly based on the parent element’s own spellcheck state, as defined below. The false state
indicates that the element is not to be checked.

element . spellcheck [ = value ]

Returns true if the element is to have its spelling and grammar checked; otherwise, returns
false.

Can be set, to override the default and set the spellcheck content attribute.

element . forceSpellCheck()

Forces the user agent to report spelling and grammar errors on the element (if checking is
enabled), even if the user has never focused the element. (If the method is not invoked, user
agents can hide errors in text that wasn’t just entered by the user.)

The spellcheck IDL attribute is not affected
by user preferences that override the spellcheck content
attribute, and therefore might not reflect the actual spellchecking state.

On setting, if the new value is true, then the element’s spellcheck content attribute must be set to the literal string
"true", otherwise it must be set to the literal string "false".

User agents must only consider the following pieces of text as checkable for the purposes of
this feature:

The value of input elements whose type attributes are in the Text, Search, URL, or E-mail states and that are mutable (i.e., that do not have the readonly attribute specified and that are not disabled).

The value of textarea elements that do not
have a readonly attribute and that are not disabled.

For text that is part of a Text node, the element with which the text is
associated is the element that is the immediate parent of the first character of the word,
sentence, or other piece of text. For text in attributes, it is the attribute’s element. For the
values of input and textarea elements, it is the element itself.

To determine if a word, sentence, or other piece of text in an applicable element (as defined
above) is to have spelling- and grammar-checking enabled, the user agent must use the following
algorithm:

If the user has disabled the checking for this text, then the checking is disabled.

Otherwise, if the user has forced the checking for this text to always be enabled, then the
checking is enabled.

Otherwise, if the element with which the text is associated has a spellcheck content attribute, then: if that attribute is in the true state, then checking is enabled; otherwise, if that attribute is in the false state, then checking is disabled.

Otherwise, if there is an ancestor element with a spellcheck content attribute that is not in the default state, then: if the nearest such ancestor’s spellcheck content attribute is in the true state, then checking is enabled; otherwise, checking is
disabled.

Otherwise, if the element’s parent element has its checking enabled, then checking
is enabled.

Otherwise, checking is disabled.

If the checking is enabled for a word/sentence/text, the user agent should indicate spelling
and grammar errors in that text. User agents should take into account the other semantics given in
the document when suggesting spelling and grammar corrections. User agents may use the language of
the element to determine what spelling and grammar rules to use, or may use the user’s preferred
language settings. user agents should use input element attributes such as pattern to ensure that the resulting value is valid, where
possible.

If checking is disabled, the user agent should not indicate spelling or grammar errors for that
text.

Even when checking is enabled, user agents may opt to not report spelling or grammar errors in
text that the user agent deems the user has no interest in having checked (e.g., text that was
already present when the page was loaded, or that the user did not type, or text in controls that
the user has not focused, or in parts of e-mail addresses that the user agent is not confident
were misspelt). The forceSpellCheck() method,
when invoked on an element, must override this behavior, forcing the user agent to consider all
spelling and grammar errors in text in that element for which checking is enabled to be of
interest to the user.

The element with ID "a" in the following example would be the one used to determine if the
word "Hello" is checked for spelling errors. In this example, it would not be.

The element with ID "b" in the following example would have checking enabled (the leading
space character in the attribute’s value on the input element causes the attribute
to be ignored, so the ancestor’s value is used instead, regardless of the default).

This specification does not define the user interface for spelling and grammar
checkers. A user agent could offer on-demand checking, could perform continuous checking while the
checking is enabled, or could use other interfaces.

5.7. Drag and drop

This section defines an event-based drag-and-drop mechanism.

This specification does not define exactly what a drag-and-drop operation actually
is.

On a visual medium with a pointing device, a drag operation could be the default action of a mousedown event that is followed by a series of mousemove events, and the drop could be triggered by the mouse
being released.

When using an input modality other than a pointing device, users would probably have to
explicitly indicate their intention to perform a drag-and-drop operation, stating what they wish
to drag and where they wish to drop it, respectively.

However it is implemented, drag-and-drop operations must have a starting point (e.g., where the
mouse was clicked, or the start of the selection or element that was selected for the drag), may
have any number of intermediate steps (elements that the mouse moves over during a drag, or
elements that the user picks as possible drop points as he cycles through possibilities), and must
either have an end point (the element above which the mouse button was released, or the element
that was finally selected), or be canceled. The end point must be the last element selected as a
possible drop point before the drop occurs (so if the operation is not canceled, there must be at
least one element in the middle step).

5.7.1. Introduction

This section is non-normative.

To make an element draggable is simple: give the element a draggable attribute, and set an event listener for dragstart that stores the data being dragged.

The event handler typically needs to check that it’s not a text selection that is being
dragged, and then needs to store data into the DataTransfer object and set the
allowed effects (copy, move, link, or some combination).

For example:

<p>What fruits do you like?</p><olondragstart="dragStartHandler(event)"><lidraggable="true">Apples</li><lidraggable="true">Oranges</li><lidraggable="true">Pears</li></ol><script>
var internalDNDType = 'text/x-example'; // set this to something specific to your site
function dragStartHandler(event) {
if (event.target instanceof HTMLLIElement) {
// use the element’s>

To accept a drop, the drop target has to have a dropzone attribute and listen to the drop event.

The value of the dropzone attribute specifies what kind of
data to accept (e.g., "string:text/plain" to accept any text strings, or
"file:image/png" to accept a PNG image file) and what kind of feedback to
give (e.g., "move" to indicate that the data will be moved).

Instead of using the dropzone attribute, a drop
target can handle the dragenter event (to report whether or
not the drop target is to accept the drop) and the dragover event (to specify what feedback is to be shown to the user).

The drop event allows the actual drop to be performed. This
event needs to be canceled, so that the dropEffect attribute’s value can be used by the source
(otherwise it’s reset).

For example:

<p>Drop your favorite fruits below:</p><oldropzone="move string:text/x-example"ondrop="dropHandler(event)"><!-- don’t forget to change the "text/x-example" type to somethingspecific to your site --></ol><script>varinternalDNDType='text/x-example';// set this to something specific to your sitefunctiondropHandler(event){varli=document.createElement('li');vardata=event.dataTransfer.getData(internalDNDType);if(data=='fruit-apple'){li.textContent='Apples';}elseif(data=='fruit-orange'){li.textContent='Oranges';}elseif(data=='fruit-pear'){li.textContent='Pears';}else{li.textContent='Unknown Fruit';}event.target.appendChild(li);}</script>

To remove the original element (the one that was dragged) from the display, the dragend event can be used.

For our example here, that means updating the original markup to handle that event:

5.7.2. The drag data store

The data that underlies a drag-and-drop operation, known as the drag data store,
consists of the following information:

A drag data store item list, which is a list of items representing the dragged
data, each consisting of the following information:

The drag data item kind

The kind of data:

Plain Unicode string

Text.

File

Binary data with a file name.

The drag data item type string

A Unicode string giving the type or format of the data, generally given by a MIME
type. Some values that are not MIME types are
special-cased for legacy reasons. The API does not enforce the use of MIME types; other values can be used as well. In all cases, however, the values
are all converted to ASCII lowercase by the API.

The dropEffect attribute controls
the drag-and-drop feedback that the user is given during a drag-and-drop operation. When the DataTransfer object is created, the dropEffect attribute is set to a string value. On
getting, it must return its current value. On setting, if the new value is one of "none", "copy", "link", or "move", then the attribute’s current value must be
set to the new value. Other values must be ignored.

The effectAllowed attribute is
used in the drag-and-drop processing model to initialize the dropEffect attribute during the dragenter and dragover events. When the DataTransfer object is
created, the effectAllowed attribute is set
to a string value. On getting, it must return its current value. On setting, if drag data
store’s mode is the read/write mode and the new value is one of "none", "copy", "copyLink", "copyMove", "link", "linkMove", "move", "all", or "uninitialized", then the attribute’s
current value must be set to the new value. Otherwise it must be left unchanged.

If there are any items in the drag data store item list whose kind is File, then add an entry to the list L consisting of the string "Files". (This value can be
distinguished from the other values because it is not lowercase.)

The clearData() method does not
affect whether any files were included in the drag, so the types attribute’s list might still not be empty after
calling clearData() (it would still contain the
"Files" string if any files were included in the drag).

The files attribute must return a liveFileList sequence consisting of File objects
representing the files found by the following steps.
Furthermore, for a given FileList object and a given underlying file, the same File object must be used each time.

Start with an empty list L.

If the DataTransfer object is no longer associated with a drag data
store, the FileList is empty. Abort these steps; return the empty list L.

The kind attribute must return the
empty string if the DataTransferItem object is in the disabled mode; otherwise
it must return the string given in the cell from the second column of the following table from the
row whose cell in the first column contains the drag data item kind of the item
represented by the DataTransferItem object:

Although, for consistency with other event interfaces, the DragEvent interface has a constructor, it is not particularly useful. In particular, there’s no way to
create a useful DataTransfer object from script, as DataTransfer objects
have a processing and security model that is coordinated by the browser during drag-and-drops.

The dataTransfer attribute of the DragEvent interface must return the value it was initialized to. It represents the
context information for the event.

When a user agent is required to fire a DND event named e at an element,
using a particular drag data store, and optionally with a specific related
target, the user agent must run the following steps:

If no specific related target was provided, set related target to
null.

Let window be the Window object of the Document object of the specified target element.

Set the dropEffect attribute to "none" if e is dragstart, drag, dragexit, or dragleave; to the value corresponding to the current
drag operation if e is drop or dragend; and to a value based on the effectAllowed attribute’s value and the
drag-and-drop source, as given by the following table, otherwise (i.e., if e is dragenter or dragover):

Where the table above provides possibly appropriate alternatives, user agents may instead use the listed alternative values if
platform conventions dictate that the user has requested those alternate effects.

For example, Windows platform conventions are such that dragging while
holding the "alt" key indicates a preference for linking the data, rather than moving or copying
it. Therefore, on a Windows system, if "link" is an option according to
the table above while the "alt" key is depressed, the user agent could select that instead of
"copy" or "move".

Create a trustedDragEvent object
and initialize it to have the given name e, to bubble, to be cancelable unless e is dragexit, dragleave, or dragend, and to have the view attribute initialized to window, the detail attribute initialized to zero, the mouse and key
attributes initialized according to the state of the input devices as they would be for user
interaction events, the relatedTarget attribute initialized to related target, and the dataTransfer attribute initialized to dataTransfer, the DataTransfer object created above.

If there is no relevant pointing device, the object must have its screenX, screenY, clientX, clientY, and button attributes set to 0.

5.7.5. Drag-and-drop processing model

When the user attempts to begin a drag operation, the user agent must run the following steps.
User agents must act as if these steps were run even if the drag actually started in another
document or application and the user agent was not aware that the drag was occurring until it
intersected with a document under the user agent’s purview.

Determine what is being dragged, as follows:

If the drag operation was invoked on a selection, then it is the selection that is being
dragged.

Otherwise, if the drag operation was invoked on a Document, it is the first
element, going up the ancestor chain, starting at the node that the user tried to drag, that has
the IDL attribute draggable set to true. If there is no such
element, then nothing is being dragged; abort these steps, the drag-and-drop operation is never
started.

Otherwise, the drag operation was invoked outside the user agent’s purview. What is being
dragged is defined by the document or application where the drag was started.

img elements and a elements with an href attribute have their draggable attribute set to true by default.

If it is a selection that is being dragged, then the source node is the Text node that the user started the drag on (typically the Text node
that the user originally clicked). If the user did not specify a particular node, for example if
the user just told the user agent to begin a drag of "the selection", then the source
node is the first Text node containing a part of the selection.

Otherwise, if it is an element that is being dragged, then the source node is
the element that is being dragged.

Otherwise, the source node is part of another document or application. When this
specification requires that an event be dispatched at the source node in this case,
the user agent must instead follow the platform-specific conventions relevant to that
situation.

Multiple events are fired on the source node during the course of the drag-and-drop
operation.

Determine the list of dragged nodes, as follows:

If it is a selection that is being dragged, then the list of dragged nodes contains, in tree order, every node that is partially or completely included in the
selection (including all their ancestors).

Dragging files can currently only happen from outside a browsing
context, for example from a file system manager application.

If the drag initiated outside of the application, the user agent must add items to the drag data store item list as appropriate for the data being dragged, honoring
platform conventions where appropriate; however, if the platform conventions do not use MIME types to label dragged data, the user agent must make a
best-effort attempt to map the types to MIME types, and, in any case, all the drag data item type strings must be converted to ASCII
lowercase.

User agents may also add one or more items representing the selection or dragged element(s)
in other forms, e.g., as HTML.

Update the drag data store default feedback as appropriate for the user agent
(if the user is dragging the selection, then the selection would likely be the basis for this
feedback; if the user is dragging an element, then that element’s rendering would be used; if
the drag began outside the user agent, then the platform conventions for determining the drag
feedback should be used).

From the moment that the user agent is to initiate the drag-and-drop operation,
until the end of the drag-and-drop operation, device input events (e.g., mouse and keyboard events)
must be suppressed.

During the drag operation, the element directly indicated by the user as the drop target is
called the immediate user selection. (Only elements can be selected by the user; other
nodes must not be made available as drop targets.) However, the immediate user
selection is not necessarily the current target element, which is the element
currently selected for the drop part of the drag-and-drop operation.

In addition, there is also a current drag operation, which can take on the values
"none", "copy", "link", and "move". Initially, it has the value
"none". It is updated by the user agent
as described in the steps below.

User agents must, as soon as the drag operation is initiated and every 350ms (±200ms) thereafter for as long as the drag
operation is ongoing, queue a task to perform the following steps in sequence:

If the user agent is still performing the previous iteration of the sequence (if any) when
the next iteration becomes due, abort these steps for this iteration (effectively "skipping
missed frames" of the drag-and-drop operation).

Otherwise, if the current target element is not a DOM element, use
platform-specific mechanisms to determine what drag operation is being performed (none, copy,
link, or move), and set the current drag operation accordingly.

No operation allowed, dropping here will cancel the drag-and-drop operation.

Otherwise, if the user ended the drag-and-drop operation (e.g., by releasing the mouse button
in a mouse-driven drag-and-drop interface), or if the drag event was canceled, then this will be the last iteration. Run the following steps, then stop the
drag-and-drop operation:

If the current drag operation is "none" (no drag operation), or, if the user
ended the drag-and-drop operation by canceling it (e.g., by hitting the Escape key),
or if the current target element is null, then the drag operation failed. Run
these substeps:

The drag was canceled. If the platform conventions dictate that this be represented to
the user (e.g., by animating the dragged selection going back to the source of the
drag-and-drop operation), then do so.

User agents are encouraged to consider how to react to drags near the edge of scrollable
regions. For example, if a user drags a link to the bottom of the viewport on a long page, it
might make sense to scroll the page so that the user can drop the link lower on the page.

This model is independent of which Document object the nodes involved are from;
the events are fired as described above and the rest of the processing model runs as described
above, irrespective of how many documents are involved in the operation.

Not shown in the above table: all these events bubble, and the effectAllowed attribute always has the value it had after the dragstart event, defaulting to
"uninitialized" in the dragstart event.

5.7.7. The draggable attribute

All html elements may have the draggable content attribute set. The draggable attribute is an enumerated attribute. It has three states. The first
state is true and it has the keyword true. The second state is false and it has the keyword false. The third state is auto; it has no keywords but
it is the missing value default.

The true state means the element is draggable; the false state means that it is not.
The auto state uses the default behavior of the user agent.

An element with a draggable attribute should also have a title attribute
that names the element for the purpose of non-visual interactions.

Can be set, to override the default and set the draggable content attribute.

The draggable IDL attribute, whose value depends on the content
attribute’s in the way described below, controls whether or not the element is draggable.
Generally, only text selections are draggable, but elements whose draggable IDL
attribute is true become draggable as well.

If an element’s draggable content attribute has the state true, the draggable IDL attribute must return true.

Otherwise, if the element’s draggable content attribute has the state false,
the draggable IDL attribute must return false.

Otherwise, the element’s draggable content attribute has the state auto. If
the element is an img element, an object element that represents an image, or an a element with an href content
attribute, the draggable IDL attribute must return true; otherwise, the draggable IDL attribute must return false.

If the draggable IDL attribute is set to the value false, the draggable content attribute must be set to the literal value "false".
If the draggable IDL attribute is set to the value true, the draggable content attribute must be set to the literal value "true".

For each value in keywords, if any, in the order that they were found in value, run the following steps.

Let keyword be the keyword.

If keyword is one of "copy", "move", or
"link", then: run the following substeps:

If operation is still unspecified, then let operation be the
string given by keyword.

Skip to the step labeled end of keyword below.

If keyword does not contain a U+003A COLON character (:), or if the first such
character in keyword is either the first character or the last character in
the string, then skip to the step labeled end of keyword below.

Let kind code be the substring of keyword from the first character
in the string to the last character in the string that is before the first U+003A COLON
character (:) in the string, converted to ASCII lowercase.

Jump to the appropriate step from the list below, based on the value of kind code:

If kind code is the string "string"

Let kind be Plain Unicode string.

If kind code is the string "file"

Let kind be File.

Otherwise

Skip to the step labeled end of keyword below.

Let type be the substring of keyword from the first character after
the first U+003A COLON character (:) in the string, to the last character in the string, converted to ASCII lowercase.

In this example, a div element is made into a drop target for image files using the dropzone attribute. Images dropped into the target are then displayed.

<divdropzone="copy file:image/png file:image/gif file:image/jpeg"ondrop="receive(event, this)"><p>Drop an image here to have it displayed.</p></div><script>functionreceive(event,element){vardata=event.dataTransfer.items;for(vari=0;i<data.length;i+=1){if((data[i].kind=='file')&&(data[i].type.match('^image/'))){varimg=newImage();img.src=window.createObjectURL(data[i].getAsFile());element.appendChild(img);}}}</script>

5.7.9. Security risks in the drag-and-drop model

User agents must not make the data added to the DataTransfer object during the dragstart event available to scripts until the drop event, because
otherwise, if a user were to drag sensitive information from one document to a second document,
crossing a hostile third document in the process, the hostile document could intercept the data.

For the same reason, user agents must consider a drop to be successful only if the user
specifically ended the drag operation — if any scripts end the drag operation, it must be
considered unsuccessful (canceled) and the drop event must not be fired.

User agents should take care to not start drag-and-drop operations in response to script
actions. For example, in a mouse-and-window environment, if a script moves a window while the
user has his mouse button depressed, the user agent would not consider that to start a drag. This is
important because otherwise user agents could cause data to be dragged from sensitive sources and
dropped into hostile documents without the user’s consent.

User agents should filter potentially active (scripted) content (e.g., HTML) when it is dragged
and when it is dropped, using a safelist of known-safe features. Similarly, relative URLs should be turned into absolute URLs to avoid references changing in unexpected ways. This
specification does not specify how this is performed.

Consider a hostile page providing some content and getting the user to select and drag and
drop (or indeed, copy and paste) that content to a victim page’s contenteditable region. If the browser does not ensure that only safe content is dragged, potentially unsafe
content such as scripts and event handlers in the selection, once dropped (or pasted) into the
victim site, get the privileges of the victim site. This would thus enable a cross-site
scripting attack.