5 Loading Web pages

This section describes features that apply most directly to Web browsers. Having said that,
except where specified otherwise, the requirements defined in this section do apply to
all user agents, whether they are Web browsers or not.

5.1 Browsing contexts

A browsing context is an environment in which Document objects are
presented to the user.

5.1.1 Nested browsing contexts

Certain elements (for example, iframe elements) can instantiate further browsing contexts. These are called nested browsing contexts. If a browsing context P has a
DocumentD with an element E that nests
another browsing context C inside it, then C is said to be
nested throughD, and E is said to be the browsing context container of C.
If the browsing context container element E is in the DocumentD, then P is
said to be the parent browsing context of C and C is said to be a child browsing context of P.
Otherwise, the nested browsing contextC has no parent
browsing context.

5.1.2 Auxiliary browsing contexts

It is possible to create new browsing contexts that are related to a top-level browsing
context without being nested through an element. Such browsing contexts are called auxiliary browsing contexts. Auxiliary browsing contexts
are always top-level browsing contexts.

5.1.2.1 Navigating auxiliary browsing contexts in the DOM

The opener IDL attribute on the Window
object, on getting, must return the WindowProxy object of the browsing
context from which the current browsing context was created (its opener
browsing context), if there is one, if it is still available, and if the current
browsing context has not disowned its opener; otherwise, it must return null.
On setting, if the new value is null then the current browsing context must disown its opener; if the new value is anything else then the
user agent must
call the [[DefineOwnProperty]] internal method of the Window object, passing the
property name "opener" as the property key, and the Property Descriptor {
[[Value]]: value, [[Writable]]: true, [[Enumerable]]: true,
[[Configurable]]: true } as the property descriptor,
where value is the new value.

5.1.3 Secondary browsing contexts

User agents may support secondary browsing
contexts, which are browsing contexts that form part
of the user agent's interface, apart from the main content area.

Each unit of related browsing contexts is then further divided into the smallest
number of groups such that every member of each group has an active document with an
effective script origin that, through appropriate manipulation of the document.domain attribute, could be made to be the same as
other members of the group, but could not be made the same as members of any other group. Each
such group is a unit of related similar-origin browsing contexts.

These values have different meanings based on whether the page is sandboxed or not, as
summarized in the following (non-normative) table. In this table, "current" means the
browsing context that the link or script is in, "parent" means the parent
browsing context of the one the link or script is in, "master" means the nearest
ancestor browsing context of the one the link or script is in, "top" means the top-level
browsing context of the one the link or script is in, "new" means a new top-level
browsing context or auxiliary browsing context is to be created, subject to
various user preferences and user agent policies, "none" means that nothing will happen, and
"maybe new" means the same as "new" if the "allow-popups" keyword is also specified on the
sandbox attribute (or if the user overrode the
sandboxing), and the same as "none" otherwise.

The task in which the algorithm is running was queued by an algorithm that was allowed to show a popup,
and the chain of such algorithms started within a user-agent defined timeframe.

For example, if a user clicked a button, it might be acceptable for a popup
to result from that after 4 seconds, but it would likely not be acceptable for a popup to result
from that after 4 hours.

The rules for choosing a browsing context given a browsing context name are as
follows. The rules assume that they are being applied in the context of a browsing
context, as part of the execution of a task.

If the given browsing context name is the empty string or _self, then
the chosen browsing context must be the current one.

If the given browsing context name is _self, then this is an
explicit self-navigation override.

If the given browsing context name is _parent, then the chosen
browsing context must be the parent browsing context of the current one,
unless there isn't one, in which case the chosen browsing context must be the current browsing
context.

If the given browsing context name is _top, then the chosen browsing
context must be the top-level browsing context of the current one, if there is one,
or else the current browsing context.

If the given browsing context name is not _blank and there exists a
browsing context whose name is the same as the given
browsing context name, and the current browsing context is familiar with that
browsing context, and the user agent determines that the two browsing contexts are related
enough that it is ok if they reach each other, then that browsing context must be the chosen
one. If there are multiple matching browsing contexts, the user agent should select one in some
arbitrary consistent manner, such as the most recently opened, most recently focused, or more
closely related.

If the browsing context is chosen by this step to be the current browsing context, then this
is also an explicit self-navigation override.

Otherwise, a new browsing context is being requested, and what happens depends on the user
agent's configuration and abilities — it is determined by the rules given for the first
applicable option from the following list:

If the algorithm is not allowed to show a popup and the
user agent has been configured to not show popups (i.e. the user agent has a "popup blocker"
enabled)

There is no chosen browsing context. The user agent may inform the user that a popup has
been blocked.

The user agent may offer to create a new top-level browsing context or reuse
an existing top-level browsing context. If the user picks one of those options,
then the designated browsing context must be the chosen one (the browsing context's name isn't
set to the given browsing context name). The default behaviour (if the user agent doesn't
offer the option to the user, or if the user declines to allow a browsing context to be used)
must be that there must not be a chosen browsing context.

If this case occurs, it means that an author has explicitly sandboxed the
document that is trying to open a link.

If the user agent has been configured such that in this instance it will
create a new browsing context, and the browsing context is being requested as part of following a hyperlink whose link
types include the noreferrer keyword

A new top-level browsing context must be created. If the given browsing
context name is not _blank, then the new top-level browsing context's
name must be the given browsing context name (otherwise, it has no name). The chosen browsing
context must be this new browsing context. The creation of such a browsing context
is a new start for session storage.

If the user agent has been configured such that in this instance it will create a new
browsing context, and the noreferrer keyword doesn't
apply

A new auxiliary browsing context must be created, with the opener
browsing context being the current one. If the given browsing context name is not _blank, then the new auxiliary browsing context's name must be the given
browsing context name (otherwise, it has no name). The chosen browsing context must be this new
browsing context.

For historical reasons, Window objects must also have a writable, configurable,
non-enumerable property named HTMLDocument whose value is the
Document interface object.

5.2.1 Security

This section describes a security model that is underdefined, imperfect, and
does not match implementations. Work is ongoing to attempt to resolve this, but in the meantime,
please do not rely on this section for precision. Implementors are urged to send their feedback on
how cross-origin cross-global access to Window and Location objects
should work. See bug 20701.

For members that return objects (including function objects), each distinct effective
script origin that is not the same as the Window object's
Document's effective script origin must be provided with a separate set
of objects. These objects must have the prototype chain appropriate for the script for which the
objects are created (not those that would be appropriate for scripts whose global
object, as specified by their settings object,
is the Window object in question).

For instance, if two frames containing Documents from different origins access the same Window object's postMessage() method, they will get distinct objects that
are not equal.

5.2.2 APIs for creating and navigating browsing contexts by name

Opens a window to show url (defaults to about:blank), and
returns it. The target argument gives the name of the new window. If a
window exists with that name already, it is reused. The replace attribute,
if true, means that whatever page is currently open in that window will be removed from the
window's session history. The features argument is ignored.

The third argument, features, has no defined effect and is mentioned for
historical reasons only. User agents may interpret this argument as instructions to set the size
and position of the browsing context, but are encouraged to instead ignore the argument
entirely.

The fourth argument, replace, specifies whether or not the new page will
replace the page currently loaded in the browsing
context, when target identifies an existing browsing context (as opposed to
leaving the current page in the browsing context's session history).

For example, suppose there is a user agent that supports control-clicking a
link to open it in a new tab. If a user clicks in that user agent on an element whose onclick handler uses the window.open() API to open a page in an iframe, but, while doing so, holds
the control key down, the user agent could override the selection of the target browsing context
to instead target a new tab.

The visible attribute, on getting, must return either
true or a value determined by the user agent to most accurately represent the visibility state of
the user interface element that the object represents, as described below. On setting, the new
value must be discarded.

The following BarProp objects exist for each Document object in a
browsing context. Some of the user interface elements represented by these objects
might have no equivalent in some user agents; for those user agents, except when otherwise
specified, the object must act as if it was present and visible (i.e. its visible attribute must return true).

The location bar BarProp object

Represents the user interface element that contains a control that displays the
URL of the active document, or some similar interface concept.

The menu bar BarProp object

Represents the user interface element that contains a list of commands in menu form, or some
similar interface concept.

The personal bar BarProp object

Represents the user interface element that contains links to the user's favorite pages, or
some similar interface concept.

The scrollbar BarProp object

Represents the user interface element that contains a scrolling mechanism, or some similar
interface concept.

The status bar BarProp object

Represents a user interface element found immediately below or after the document, as
appropriate for the user's media. If the user agent has no such user interface element, then the
object may act as if the corresponding user interface element was absent (i.e. its visible attribute may return false).

The toolbar BarProp object

Represents the user interface element found immediately above or before the document, as
appropriate for the user's media. If the user agent has no such user interface element, then the
object may act as if the corresponding user interface element was absent (i.e. its visible attribute may return false).

For historical reasons, the status attribute
on the Window object must, on getting, return the last string it was set to, and on
setting, must set itself to the new value. When the Window object is created, the
attribute must be set to the empty string. It does not do anything else.

In the following example, the variable x is set to the
WindowProxy object returned by the window accessor
on the global object. All of the expressions following the assignment return true, because in
every respect, the WindowProxy object acts like the underlying Window
object.

var x = window;
x instanceof Window; // true
x === this; // true

5.3 Origin

Origins are the fundamental currency of the Web's security model. Two actors in the Web
platform that share an origin are assumed to trust each other and to have the same authority.
Actors with differing origins are considered potentially hostile versus each other, and are
isolated from each other to varying degrees.

For example, if Example Bank's Web site, hosted at bank.example.com, tries to examine the DOM of Example Charity's Web site, hosted
at charity.example.org, a SecurityError exception will be
raised.

The origin of a resource and the effective script origin of a resource
are both either opaque identifiers or tuples consisting of a scheme component, a host component, a
port component, and optionally extra data.

The extra data could include the certificate of the site when using encrypted
connections, to ensure that if the site's secure certificate changes, the origin is considered to
change as well.

Apply the IDNA ToUnicode algorithm to each component of the host part of the
origin tuple, and append the results — each component, in the same order,
separated by "." (U+002E) characters — to result. [RFC3490]

If the port part of the origin tuple gives a port that is different from the
default port for the protocol given by the scheme part of the origin tuple, then
append a ":" (U+003A) character and the given port, in base ten, to result.

Return result.

The ASCII serialization of an origin is the string obtained by applying the
following algorithm to the given origin:

If the origin in question is not a scheme/host/port tuple, then return the
literal string "null" and abort these steps.

Apply the IDNA ToASCII algorithm to the host part of the origin tuple, with both
the AllowUnassigned and UseSTD3ASCIIRules flags set, and append the results to result.

If ToASCII fails to convert one of the components of the string, e.g. because it is too long
or because it contains invalid characters, then return the empty string and abort these steps.
[RFC3490]

If the port part of the origin tuple gives a port that is different from the
default port for the protocol given by the scheme part of the origin tuple, then
append a ":" (U+003A) character and the given port, in base ten, to result.

Return result.

Two origins are said to be the same origin if the
following algorithm returns true:

Let A be the first origin being compared, and B be the second origin being compared.

If A and B are both opaque identifiers, and their
value is equal, then return true.

Otherwise, if either A or B or both are opaque
identifiers, return false.

If A and B have scheme components that are not
identical, return false.

If A and B have host components that are not
identical, return false.

If A and B have port components that are not
identical, return false.

If either A or B have additional data, but that
data is not identical for both, return false.

Return true.

5.3.1 Relaxing the same-origin restriction

Can be set to a value that removes subdomains, to change the effective script
origin to allow pages on other subdomains of the same domain (if they do the same thing)
to access each other. (Can't be set in sandboxed iframes.)

The domain attribute on
Document objects must be initialized to the document's domain, if it has
one, and the empty string otherwise. If the document's domain starts with a "[" (U+005B) character and ends with a "]" (U+005D) character, it is
an IPv6 address; these square brackets must be omitted when initializing the attribute's
value.

On getting, the attribute must return its current value, unless the Document has
no browsing context, in which case it must return the empty string.

If the new value is an IPv4 or IPv6 address, let new value be the new
value. Otherwise, apply the IDNA ToASCII algorithm to the new value, with both the
AllowUnassigned and UseSTD3ASCIIRules flags set, and let new value be the
result of the ToASCII algorithm.

If ToASCII fails to convert one of the components of the string, e.g. because it is too long
or because it contains invalid characters, then throw a SecurityError exception and
abort these steps. [RFC3490]

If new value is not exactly equal to the current value of the document.domain attribute, then run these substeps:

If the current value is an IPv4 or IPv6 address, throw a SecurityError
exception and abort these steps.

If new value, prefixed by a "." (U+002E), does not exactly
match the end of the current value, throw a SecurityError exception and abort
these steps.

If the new value is an IPv4 or IPv6 address, it cannot
match the new value in this way and thus an exception will be thrown
here.

If new value matches a suffix in the Public Suffix List, or, if new value, prefixed by a "." (U+002E), matches the end of a suffix in
the Public Suffix List, then throw a SecurityError exception and abort these
steps. [PSL]

Suffixes must be compared after applying the IDNA ToASCII algorithm to them, with both the
AllowUnassigned and UseSTD3ASCIIRules flags set, in an ASCII case-insensitive
manner. [RFC3490]

The domain of a Document is the host part
of the document's origin, if the value of that origin is a
scheme/host/port tuple. If it isn't, then the document does not have a domain.

The domain attribute is used to enable
pages on different hosts of a domain to access each others' DOMs.

Do not use the document.domain
attribute when using shared hosting. If an untrusted third party is able to host an HTTP server at
the same IP address but on a different port, then the same-origin protection that normally
protects two different sites on the same host will fail, as the ports are ignored when comparing
origins after the document.domain attribute has been
used.

5.4 Sandboxing

A sandboxing flag set is a set of zero or more of the following flags, which are
used to restrict the abilities that potentially untrusted resources have:

First, it can be used to allow content from the same site to be sandboxed to disable
scripting, while still allowing access to the DOM of the sandboxed content.

Second, it can be used to embed content from a third-party site, sandboxed to prevent that
site from opening pop-up windows, etc, without preventing the embedded page from
communicating back to its originating site, using the database APIs to store data, etc.

This flag is relaxed by the same keyword as scripts, because when scripts are
enabled these features are trivially possible anyway, and it would be unfortunate to force
authors to use script to do them when sandboxed rather than allowing them to use the
declarative features.

Pages can addstate
objects to the session history. These are then returned to the
script when the user (or script) goes back in the history, thus enabling authors to use the
"navigation" metaphor even in one-page applications.

State objects are intended to be used for two main purposes:
first, storing a preparsed description of the state in the URL so that in the simple
case an author doesn't have to do the parsing (though one would still need the parsing for
handling URLs passed around by users, so it's only a minor
optimization), and second, so that the author can store state that one wouldn't store in the URL
because it only applies to the current Document instance and it would have to be
reconstructed if a new Document were opened.

An example of the latter would be something like keeping track of the precise coordinate from
which a pop-up div was made to animate, so that if the user goes back, it can be
made to animate to the same location. Or alternatively, it could be used to keep a pointer into a
cache of data that would be fetched from the server based on the information in the
URL, so that when going back and forward, the information doesn't have to be fetched
again.

An entry with persisted user state is one that also has user-agent defined state.
This specification does not specify what kind of state can be stored.

For example, some user agents might want to persist the scroll position, or the
values of form controls.

User agents that persist the value of form controls are encouraged to also persist
their directionality (the value of the element's dir attribute).
This prevents values from being displayed incorrectly after a history traversal when the user had
originally entered the values with an explicit, non-default directionality.

Entries that consist of state objects share the same
Document as the entry for the page that was active when they were added.

Contiguous entries that differ just by fragment identifier also share the same
Document.

All entries that share the same Document (and that are therefore
merely different states of one particular document) are contiguous by definition.

User agents may discard the Document
objects of entries other than the current entry that are not referenced from any
script, reloading the pages afresh when the user or script navigates back to such pages. This
specification does not specify when user agents should discard Document objects and
when they should cache them.

Entries that have had their Document objects discarded must, for the purposes of
the algorithms given below, act as if they had not. When the user or script navigates back or
forwards to a page which has no in-memory DOM objects, any other entries that shared the same
Document object with it must share the new object as well.

Entries in the joint session history are ordered chronologically by the time they
were added to their respective session histories. Each entry
has an index; the earliest entry has index 0, and the subsequent entries are numbered with
consecutively increasing integers (1, 2, 3, etc).

All the getters and setters for attributes, and all the methods, defined on the
History interface, when invoked on a History object associated with a
Document that is not fully active, must throw a
SecurityError exception instead of operating as described below.

The state attribute of the
History interface must return the last value it was set to by the user agent.
Initially, its value must be null.

When the go(delta) method is
invoked, if the argument to the method was omitted or has the value zero, the user agent must act
as if the location.reload() method was called instead.
Otherwise, the user agent must traverse the history by a delta whose value is the
value of the method's argument.

If there is an ongoing attempt to navigate specified browsing context
that has not yet matured (i.e. it has not passed the
point of making its Document the active document), then cancel that
attempt to navigate the browsing context.

The title is purely advisory. User agents might use the title
in the user interface.

User agents may limit the number of state objects added to the session history per page. If a
page hits the UA-defined limit, user agents must remove the entry immediately after the first
entry for that Document object in the session history after having added the new
entry. (Thus the state history acts as a FIFO buffer for eviction, but as a LIFO buffer for
navigation.)

Consider a game where the user can navigate along a line, such that the user is always at some
coordinate, and such that the user can bookmark the page corresponding to a particular
coordinate, to return to it later.

A static page implementing the x=5 position in such a game could look like the following:

<!DOCTYPE HTML>
<!-- this is http://example.com/line?x=5 -->
<title>Line Game - 5</title>
<p>You are at coordinate 5 on the line.</p>
<p>
<a href="?x=6">Advance to 6</a> or
<a href="?x=4">retreat to 4</a>?
</p>

The problem with such a system is that each time the user clicks, the whole page has to be
reloaded. Here instead is another way of doing it, using script:

In systems without script, this still works like the previous example. However, users that
do have script support can now navigate much faster, since there is no network access
for the same experience. Furthermore, contrary to the experience the user would have with just a
naïve script-based approach, bookmarking and navigating the session history still work.

In the example above, the data argument to the pushState() method is the same information as would be sent
to the server, but in a more convenient form, so that the script doesn't have to parse the URL
each time the user navigates.

Applications might not use the same title for a session history entry as the
value of the document's title element at that time. For example, here is a simple
page that shows a block in the title element. Clearly, when navigating backwards to
a previous state the user does not go back in time, and therefore it would be inappropriate to
put the time in the session history title.

In the task in which the algorithm is running, the event
listener for a trustedclick event is being handled.

If mode is normal navigation, then act as if the assign() method had been called with value
as its argument. Otherwise, act as if the replace()
method had been called with value as its argument.

5.5.3.1 Security

This section describes a security model that is underdefined, imperfect, and
does not match implementations. Work is ongoing to attempt to resolve this, but in the meantime,
please do not rely on this section for precision. Implementors are urged to send their feedback on
how cross-origin cross-global access to Window and Location objects
should work. See bug 20701.

5.5.4 Implementation notes for session history

This section is non-normative.

The History interface is not meant to place restrictions on how implementations
represent the session history to the user.

For example, session history could be implemented in a tree-like manner, with each page having
multiple "forward" pages. This specification doesn't define how the linear list of pages in the
history object are derived from the actual session history as
seen from the user's perspective.

Similarly, a page containing two iframes has a history object distinct from the iframes' history objects, despite the fact that typical Web browsers present the
user with just one "Back" button, with a session history that interleaves the navigation of the
two inner frames and the outer page.

Security: It is suggested that to avoid letting a page "hijack" the history
navigation facilities of a UA by abusing pushState(),
the UA provide the user with a way to jump back to the previous page (rather than just going back
to the previous state). For example, the back button could have a drop down showing just the pages
in the session history, and not showing any of the states. Similarly, an aural browser could have
two "back" commands, one that goes back to the previous state, and one that jumps straight back to
the previous page.

In addition, a user agent could ignore calls to pushState() that are invoked on a timer, or from event
listeners that are not triggered in response to a clear user action, or that are invoked in rapid
succession.

5.6 Browsing the Web

5.6.1 Navigating across documents

Certain actions cause the browsing context to navigate to a new resource.
A user agent may provide various ways for the user to explicitly cause a browsing context to
navigate, in addition to those defined in this specification.

A resource has a URL, but that might not be the only information necessary
to identify it. For example, a form submission that uses HTTP POST would also have the HTTP method
and payload. Similarly, an iframesrcdoc document needs to know the data it is to use.

Navigation always involves source browsing context, which is the browsing context which
was responsible for starting the navigation.

When a browsing context is navigated to a new resource, the user
agent must run the following steps:

The handle redirects step later in
this algorithm can in certain cases jump back to the step labeled fragment identifiers. Since, between those two steps,
this algorithm goes from operating synchronously in the context of the calling task to operating asynchronously independent of the event
loop, some of the intervening steps need to be able to handle both being synchronous and
being asynchronous. The gone async flag is thus used to make these steps
aware of which mode they are operating in.

If the new resource is to be handled using a mechanism that does not affect the browsing
context, e.g. ignoring the navigation request altogether because the specified scheme is not one
of the supported protocols, then abort these steps and proceed with that mechanism
instead.

If the new resource is to be handled by displaying some sort of inline content, e.g. an error
message because the specified scheme is not one of the supported protocols, or an inline prompt
to allow the user to select a registered
handler for the given scheme, then display the inline
content and abort these steps.

In the case of a registered handler being used, the algorithm will be reinvoked
with a new URL to handle the request.

Process results: If the result of executing the script is void (there is no return
value), then the result of obtaining the resource for the URL is equivalent to an HTTP resource with an HTTP
204 No Content response.

Otherwise, the result of obtaining the resource for the URL is equivalent to an HTTP resource with a 200 OK
response whose Content-Type metadata is
text/html and whose response body is the return value converted to a string
value.

For example, imagine an HTML page with an associated application cache
displaying an image and a form, where the image is also used by several other application
caches. If the user right-clicks on the image and chooses "View Image", then the user agent
could decide to show the image from any of those caches, but it is likely that the most useful
cache for the user would be the one that was used for the aforementioned HTML page. On the
other hand, if the user submits the form, and the form does a POST submission, then the user
agent will not use an application cache at all; the submission will be made to the
network.

If gone async is false, return to whatever algorithm invoked the
navigation steps and continue running these steps asynchronously.

Let gone async be true.

Handle redirects: If fetching the resource results in a redirect, and either the URL of the target
of the redirect has the same origin as the original resource, or the resource is
being obtained using the POST method or a safe method (in HTTP terms), return to the step labeled fragment identifiers with the new
resource, except that if the URL of the target of the redirect does not have a
fragment identifier and the URL of the resource that led to the redirect does, then
the fragment identifier of the resource that led to the redirect must be propagated to the
URL of the target of the redirect.

So for instance, if the original URL was "http://example.com/#!sample" and "http://example.com/" is
found to redirect to "https://example.com/", the URL of the new resource
will be "https://example.com/#!sample".

Otherwise, if fetching the resource results in a redirect but the URL of the
target of the redirect does not have the same origin as the original resource and
the resource is being obtained using a method that is neither the POST method nor a safe method
(in HTTP terms), then abort these steps. The user agent may indicate to the user that the
navigation has been aborted for security reasons.

Wait for one or more bytes to be available or for the user agent to establish that the
resource in question is empty. During this time, the user agent may allow the user to cancel this
navigation attempt or start other navigation attempts.

Fallback in prefer-online mode: If the resource was not fetched from an
application cache, and was to be fetched using HTTP GET or equivalent, and there are relevant application caches that are identified by a URL with the
same origin as the URL in question, and that have this URL as one of their entries,
excluding entries marked as foreign, and whose
mode is prefer-online, and the user didn't cancel the
navigation attempt during the earlier step, and the navigation attempt failed (e.g. the server
returned a 4xx or 5xx status code or
equivalent, or there was a DNS error), then:

If candidate is not marked as foreign, then the user agent must discard the failed
load and instead continue along these steps using candidate as the resource.
The user agent may indicate to the user that the original page load failed, and that the page
used was a previously cached resource.

If candidate is not marked as foreign, then the user agent must discard the failed
load and instead continue along these steps using candidate as the resource.
The document's address, if appropriate, will still be the originally requested URL,
not the fallback URL, but the user agent may indicate to the user that the original page load
failed, that the page used was a fallback resource, and what the URL of the fallback resource
actually is.

Resource handling: If the resource's out-of-band metadata (e.g. HTTP headers), not
counting any type information (such as the Content-Type HTTP
header), requires some sort of processing that will not affect the browsing context, then
perform that processing and abort these steps.

Such processing might be triggered by, amongst other things, the following:

HTTP status codes (e.g. 204 No Content or 205 Reset Content)

Network errors (e.g. the network interface being unavailable)

Cryptographic protocol failures (e.g. an incorrect TLS certificate)

Responses with HTTP Content-Disposition headers
specifying the attachment disposition type must be handled as a
download.

HTTP 401 responses that do not include a challenge recognized by the user agent must be
processed as if they had no challenge, e.g. rendering the entity body as if the response had
been 200 OK.

User agents may show the entity body of an HTTP 401 response even when the response does
include a recognized challenge, with the option to login being included in a non-modal fashion,
to enable the information provided by the server to be used by the user before authenticating.
Similarly, user agents should allow the user to authenticate (in a non-modal fashion) against
authentication challenges included in other responses such as HTTP 200 OK responses, effectively
allowing resources to present HTTP login forms without requiring their use.

If the user agent has been configured to process resources of the given type using some mechanism other than rendering the content in a browsing
context, then skip this step. Otherwise, if the type is one of the
following types, jump to the appropriate entry in the following list, and process the resource as
described there:

Follow the steps given in the XML document section. If
that section determines that the content is not to be displayed as a generic XML
document, then proceed to the next step in this overall set of steps. Otherwise, once the steps given in the XML document section have completed,
abort this navigate algorithm.

"text/plain"

Follow the steps given in the plain text file section,
and then, once they have completed, abort this navigate algorithm.

Follow the steps given in the media section, and then,
once they have completed, abort this navigate algorithm.

A type that will use an external application to render the content in the browsing
context

Follow the steps given in the plugin section, and then,
once they have completed, abort this navigate algorithm.

An explicitly supported XML type is one for which the user agent is configured to
use an external application to render the content (either a plugin rendering
directly in the browsing context, or a separate application), or one for which the
user agent has dedicated processing rules (e.g. a Web browser with a built-in Atom feed viewer
would be said to explicitly support the application/atom+xml MIME type), or one for
which the user agent has a dedicated handler (e.g. one registered using registerContentHandler()).

Setting the document's address: If there is no
override URL, then any Document created by these steps must have its
address set to the URL that was
originally to be fetched, ignoring any other data that was used to
obtain the resource (e.g. the entity body in the case of a POST submission is not part of
the document's address, nor is the URL of the fallback resource in the case of the
original load having failed and that URL having been found to match a fallback namespace). However, if there is
an override URL, then any Document created by these steps must have
its address set to that URL
instead.

Set the document's referrer to the address of the resource from which
Request-URIs are obtained as determined when the fetch algorithm obtained the
resource, if that algorithm was used and determined such a value; otherwise, set it to the
empty string.

Non-document content: If, given type, the new resource is to be
handled by displaying some sort of inline content, e.g. a native rendering of the content, an
error message because the specified type is not supported, or an inline prompt to allow the user
to select a registered handler for the
given type, then display the inline content, and then abort
these steps.

In the case of a registered handler being used, the algorithm will be reinvoked
with a new URL to handle the request.

Otherwise, the document's type is such that the resource will not
affect the browsing context, e.g. because the resource is to be handed to an external application
or because it is an unknown type that will be processed as a download. Process the
resource appropriately.

Some of the sections below, to which the above algorithm defers in certain cases, require the
user agent to update the session history with the new page. When a user agent is
required to do this, it must queue a task (associated with the Document
object of the current entry, not the new one) to run the following steps:

If this instance of the navigation algorithm is canceled while
this step is running the unload a document algorithm, then the unload a
document algorithm must be allowed to run to completion, but this instance of the navigation algorithm must not run beyond this step. (In particular, for
instance, the cancelation of this algorithm does not abort any event dispatch or script
execution occurring as part of unloading the document or its descendants.)

If the navigation was initiated for entry update of an entry

Replace the Document of the entry being updated, and any other entries
that referenced the same document as that entry, with the new Document.

This can only happen if the entry being updated is not the current
entry, and can never happen with replacement enabled. (It happens when the
user tried to traverse to a session history entry that no longer had a Document
object.)

Fragment identifier loop: Spin the event loop for a user-agent-defined
amount of time, as desired by the user agent implementor. (This is intended to allow the user
agent to optimize the user experience in the face of performance concerns.)

If the Document object has no parser, or its parser has stopped parsing, or the user agent has reason to believe the user is no longer
interested in scrolling to the fragment identifier, then abort these steps.

The input byte stream converts bytes into characters for use in the
tokenizer. This process relies, in part, on character encoding
information found in the real Content-Type metadata of the
resource; the "sniffed type" is not used for this purpose.

When no more bytes are available, the user agent must queue a task for the parser
to process the implied EOF character, which eventually causes a load event to be fired.

The actual HTTP headers and other metadata, not the headers as mutated or implied by the
algorithms given in this specification, are the ones that must be used when determining the
character encoding according to the rules given in the above specifications. Once the character
encoding is established, the document's character encoding must be set to that
character encoding.

Because the processing of the manifest
attribute happens only once the root element is parsed, any URLs referenced by processing
instructions before the root element (such as <?xml-stylesheet?> and
<?xbl?> PIs) will be fetched from the network and cannot be
cached.

User agents may examine the namespace of the root Element node of this
Document object to perform namespace-based dispatch to alternative processing tools,
e.g. determining that the content is actually a syndication feed and passing it to a feed handler.
If such processing is to take place, abort the steps in this section, and jump to the next step (labeled non-document content) in the
navigate steps above.

Otherwise, then, with the newly created Document, the user agent must update
the session history with the new page. User agents may do this before the complete document
has been parsed (thus achieving incremental rendering), and must do this before any scripts
are to be executed.

Error messages from the parse process (e.g. XML namespace well-formedness errors) may be
reported inline by mutating the Document.

The rules for how to convert the bytes of the plain text document into actual characters, and
the rules for actually rendering the text to the user, are defined in RFC 2046, RFC 3676, and
subsequent versions thereof. [RFC2046][RFC3676]

User agents may add content to the head element of the Document, e.g.
linking to a style sheet or an XBL binding, providing script, giving the document a
title, etc.

In particular, if the user agent supports the Format=Flowed
feature of RFC 3676 then the user agent would need to apply extra styling to cause the text to
wrap correctly and to handle the quoting feature. This could be performed using, e.g., an XBL
binding or a CSS extension.

For each body part obtained from the resource, the user agent must run a new instance of the
navigate algorithm, starting from the resource handling step, using the new
body part as the resource being navigated, with replacement enabled if a previous
body part from the same resource resulted in a Document object being created, and otherwise using the same setup as the
navigate attempt that caused this section to be invoked in the first place.

For the purposes of algorithms processing these body parts as if they were complete stand-alone
resources, the user agent must act as if there were no more bytes for those resources whenever the
boundary following the body part is reached.

Thus, load events (and for that matter unload events) do fire for each body part loaded.

5.6.6 Page load processing model for media

When an image, video, or audio resource is to be loaded in a browsing context, the
user agent should create a Document object, mark it as being an HTML document, set its content type to the sniffed MIME type of the resource
(type in the navigate algorithm), append an html
element to the Document, append a head element and a body
element to the html element, append an element host element for
the media, as described below, to the body element, and set the appropriate attribute
of the element host element, as described below, to the address of the image,
video, or audio resource.

The element host element to create for the media is the element given in
the table below in the second cell of the row whose first cell describes the media. The
appropriate attribute to set is the one given by the third cell in that same row.

User agents may add content to the head element of the Document, or
attributes to the element host element, e.g. to link to a style sheet or an
XBL binding, to provide a script, to give the document a title, to make the media
autoplay, etc.

Append a new entry at the end of the History object representing the new
resource and its Document object and related state. Its URL must be set
to the address to which the user agent was navigating. The title
must be left unset.

When the user agent is required to scroll to the fragment identifier and the
indicated part of the document, if any, is being rendered, the user agent must
either change the scrolling position of the document using the following algorithm, or perform
some other action such that the indicated part of the document is brought to the
user's attention. If there is no indicated part, or if the indicated part is not being
rendered, then the user agent must do nothing. The aforementioned algorithm is as
follows:

The indicated part of the document is the one that the fragment identifier, if any,
identifies. The semantics of the fragment identifier in terms of mapping it to a specific DOM Node
is defined by the specification that defines the MIME type used by the
Document (for example, the processing of fragment identifiers for XML MIME types is the responsibility of RFC3023). [RFC3023]

Let decoded fragid be the result of applying the UTF-8
decoder algorithm to fragid bytes. If the UTF-8 decoder
emits a decoder error, abort the decoder and instead jump to the step labeled no
decoded fragid.

If there is an element in the DOM that has an ID exactly
equal to decoded fragid, then the first such element in tree order is
the indicated part of the document; stop the algorithm here.

No decoded fragid: If there is an a element in the DOM that has a name attribute whose value is exactly equal to fragid (notdecoded fragid), then the first such
element in tree order is the indicated part of the document; stop the algorithm
here.

5.6.10 History traversal

When a user agent is required to traverse the history to a specified
entry, optionally with replacement enabled, and optionally with the
asynchronous events flag set, the user agent must act as follows.

If there is no longer a Document object for the entry in question,
navigate the browsing
context to the resource for that entry to perform an entry update of that
entry, and abort these steps. The "navigate" algorithm reinvokes this "traverse"
algorithm to complete the traversal, at which point there is a Document
object and so this step gets skipped. The navigation must be done using the same source
browsing context as was used the first time this entry was created. (This can never
happen with replacement enabled.)

If the resource was obtained usign a non-idempotent action, for example a POST
form submission, or if the resource is no longer available, for example because the computer is
now offline and the page wasn't cached, navigating to it again might not be possible. In this
case, the navigation will result in a different page than previously; for example, it might be
an error message explaining the problem or offering to resubmit the form.

If the specified entry has a URL whose fragment identifier differs
from that of the current entry's when compared in a case-sensitive
manner, and the two share the same Document object, then let hash
changed be true, and let old URL be the URL of the current
entry and new URL be the URL of the specified
entry. Otherwise, let hash changed be false.

If the traversal was initiated with replacement enabled, remove the entry
immediately before the specified entry in the session history.

If the entry is an entry with persisted user state, the user agent may update
aspects of the document and its rendering, for instance the scroll position or values of form
fields, that it had previously recorded.

This can even include updating the dir attribute
of textarea elements or input elements whose type attribute is in either the Text state or the Search state, if the persisted state includes the
directionality of user input in such controls.

If the asynchronous events flag is not set, then run the following steps
synchronously. Otherwise, the asynchronous events flag is set; queue a task
to run the following substeps.

If state changed is true, fire a trusted
event with the name popstate at the Window
object of the Document, using the PopStateEvent interface, with the
state attribute initialized to the value of state. This event must bubble but not be cancelable and has no default
action.

The state attribute must return the
value it was initialized to. When the object is created, this attribute must be initialized to
null. It represents the context information for the event, or null, if the state represented is
the initial state of the Document.

The hashchange event is fired when navigating
to a session history entry whose URL differs from that of the previous
one only in the fragment identifier.

The oldURL attribute must return the
value it was initialized to. When the object is created, this attribute must be initialized to
null. It represents context information for the event, specifically the URL of the session
history entry that was traversed from.

The newURL attribute must return the
value it was initialized to. When the object is created, this attribute must be initialized to
null. It represents context information for the event, specifically the URL of the session
history entry that was traversed to.

For the pageshow event, returns false if the page is
newly being loaded (and the load event will fire). Otherwise,
returns true.

For the pagehide event, returns false if the page is
going away for the last time. Otherwise, returns true, meaning that (if nothing conspires to
make the page unsalvageable) the page might be reused if the user navigates back to this
page.

If any event listeners were triggered by the earlier dispatch step, then set the
Document's salvageable state to
false.

If the returnValue attribute of the
event object is not the empty string, or if the event was canceled, then the
user agent should ask the user to confirm that they wish to unload the document.

The prompt shown by the user agent may include the string of the returnValue attribute, or some leading subset
thereof. (A user agent may want to truncate the string to 1024 characters for display, for
instance.)

If the user did not confirm the page navigation, then the user agent refused to allow
the document to be unloaded.

If this algorithm was invoked by another instance of the "prompt to unload a document"
algorithm (i.e. through the steps below that invoke this algorithm for all descendant browsing
contexts), then jump to the step labeled end.

When a user agent is to unload a document, it must run the following steps. These
steps are passed an argument, recycle, which is either true or false,
indicating whether the Document object is going to be re-used. (This is set by the
document.open() method.)

If this algorithm was invoked by another instance of the "unload a document" algorithm
(i.e. by the steps below that invoke this algorithm for all descendant browsing contexts), then
jump to the step labeled end.

The returnValue attribute
represents the message to show the user. When the event is created, the attribute must be set to
the empty string. On getting, it must return the last value it was set to. On setting, the
attribute must be set to the new value.

5.6.12 Aborting a document load

If a Document is aborted, the user agent must
run the following steps:

Cancel any instances of the fetch algorithm in the context of
this Document, discarding any tasksqueued for them, and discarding any further data received from the
network for them. If this resulted in any instances of the fetch
algorithm being canceled or any queuedtasks or any network data getting discarded, then set the
Document's salvageable state to
false.

5.7 Offline Web applications

5.7.1 Introduction

This section is non-normative.

In order to enable users to continue interacting with Web applications and documents even when
their network connection is unavailable — for instance, because they are traveling outside
of their ISP's coverage area — authors can provide a manifest which lists the files that are
needed for the Web application to work offline and which causes the user's browser to keep a copy
of the files for use offline.

If the user tries to open the "clock.html" page while offline, though,
the user agent (unless it happens to have it still in the local cache) will fail with an
error.

The author can instead provide a manifest of the three files, say "clock.appcache":

EXAMPLE offline/clock/clock2.appcache

With a small change to the HTML file, the manifest (served as text/cache-manifest)
is linked to the application:

EXAMPLE offline/clock/clock2.html

Now, if the user goes to the page, the browser will cache the files and make them available
even when the user is offline.

Authors are encouraged to include the main page in the manifest also, but in
practice the page that referenced the manifest is automatically cached even if it isn't explicitly
mentioned.

With the exception of "no-store" directive, HTTP cache headers and restrictions on
caching pages served over TLS (encrypted, using https:) are overridden by
manifests. Thus, pages will not expire from an application cache before the user agent has updated
it, and even applications served over TLS can be made to work offline.

5.7.1.1 Supporting offline caching for legacy applications

This section is non-normative.

The application cache feature works best if the application logic is separate from the
application and user data, with the logic (markup, scripts, style sheets, images, etc) listed in
the manifest and stored in the application cache, with a finite number of static HTML pages for
the application, and with the application and user data stored in Web Storage or a client-side
Indexed Database, updated dynamically using Web Sockets, XMLHttpRequest, server-sent
events, or some other similar mechanism.

This model results in a fast experience for the user: the application immediately loads, and
fresh data is obtained as fast as the network will allow it (possibly while stale data shows).

Legacy applications, however, tend to be designed so that the user data and the logic are mixed
together in the HTML, with each operation resulting in a new HTML page from the server.

For example, consider a news application. The typical architecture of such an application,
when not using the application cache feature, is that the user fetches the main page, and the
server returns a dynamically-generated page with the current headlines and the user interface
logic mixed together.

A news application designed for the application cache feature, however, would instead have the
main page just consist of the logic, and would then have the main page fetch the data separately
from the server, e.g. using XMLHttpRequest.

The mixed-content model does not work well with the application cache feature: since the
content is cached, it would result in the user always seeing the stale data from the previous time
the cache was updated.

While there is no way to make the legacy model work as fast as the separated model, it
can at least be retrofitted for offline use using the prefer-onlineapplication cache mode. To do so, list all the static
resources used by the HTML page you want to have work offline in an application cache manifest, use the manifest attribute to select that manifest from the HTML file,
and then add the following line at the bottom of the manifest:

SETTINGS:
prefer-online
NETWORK:
*

This causes the application cache to only be used for master entries when the user is offline, and causes the
application cache to be used as an atomic HTTP cache (essentially pinning resources listed in the
manifest), while allowing all resources not listed in the manifest to be accessed normally when
the user is online.

5.7.1.2 Event summary

This section is non-normative.

When the user visits a page that declares a manifest, the browser will try to update the cache.
It does this by fetching a copy of the manifest and, if the manifest has changed since the user
agent last saw it, redownloading all the resources it mentions and caching them anew.

As this is going on, a number of events get fired on the ApplicationCache object
to keep the script updated as to the state of the cache update, so that the user can be notified
appropriately. The events are as follows:

The user agent is downloading resources listed by the manifest.
The event object's total attribute returns the total number of files to be downloaded.
The event object's loaded attribute returns the number of files processed so far.

The manifest was a 404 or 410 page, so the attempt to cache the application has been aborted.

Last event in sequence.

The manifest hadn't changed, but the page referencing the manifest failed to download properly.

A fatal error occurred while fetching the resources listed in the manifest.

The manifest changed while the update was being run.

The user agent will try fetching the files again momentarily.

These events are cancelable; their default action is for the user agent to show download
progress information. If the page shows its own update UI, canceling the events will prevent the
user agent from showing redundant progress information.

5.7.2 Application caches

An application cache is a set of cached resources consisting of:

One or more resources (including their out-of-band metadata, such as HTTP headers, if any),
identified by URLs, each falling into one (or more) of the following categories:

Master entries

These are documents that were added to the cache because a browsing
context was navigated to that document and the document
indicated that this was its cache, using the manifest
attribute.

A URL in the list can be flagged with multiple different types, and thus an
entry can end up being categorized as multiple entries. For example, an entry can be a manifest
entry and an explicit entry at the same time, if the manifest is listed within the manifest.

Zero or more fallback namespaces, each of
which is mapped to a fallback entry.

These are used as prefix match patterns, and declare URLs for which the user
agent will ignore the application cache, instead fetching them normally (i.e. from the network
or local HTTP cache as appropriate).

An online whitelist wildcard
flag, which is either open or blocking.

The open state indicates that any URL not listed as cached is to
be implicitly treated as being in the online
whitelist namespaces; the blocking state indicates that URLs not listed
explicitly in the manifest are to be treated as unavailable.

A cache mode flag, which is either in the fast state or the prefer-online state.

Each application cache has a completeness flag, which is either complete or
incomplete.

Multiple application caches in different application cache groups can contain the same resource,
e.g. if the manifests all reference that resource. If the user agent is to select an application cache from a list of relevant application caches that contain a resource, the
user agent must use the application cache that the user most likely wants to see the resource
from, taking into account the following:

which application cache was most recently updated,

which application cache was being used to display the resource from which the user decided to
look at the new resource, and

which application cache the user prefers.

A URL matches a fallback namespace if
there exists a relevant application cache whose manifest's URL has the same origin as the
URL in question, and that has a fallback
namespace that is a prefix match for the URL being examined. If multiple
fallback namespaces match the same URL, the longest one is the one that matches. A URL looking for
a fallback namespace can match more than one application cache at a time, but only matches one
namespace in each cache.

If a manifest http://example.com/app1/manifest declares that http://example.com/resources/images is a fallback namespace, and the user
navigates to HTTP://EXAMPLE.COM:80/resources/images/cat.png, then the user
agent will decide that the application cache identified by http://example.com/app1/manifest contains a namespace with a match for that
URL.

5.7.3 The cache manifest syntax

5.7.3.1 Some sample manifests

This section is non-normative.

This example manifest requires two images and a style sheet to be cached and whitelists a CGI
script.

CACHE MANIFEST
# the above line is required
# this is a comment
# there can be as many of these anywhere in the file
# they are all ignored
# comments can have spaces before them
# but must be alone on the line
# blank lines are ignored too
# these are files that need to be cached they can either be listed
# first, or a "CACHE:" header could be put before them, as is done
# lower down.
images/sound-icon.png
images/background.png
# note that each file has to be put on its own line
# here is a file for the online whitelist -- it isn't cached, and
# references to this file will bypass the cache, always hitting the
# network (or trying to, if the user is offline).
NETWORK:
comm.cgi
# here is another set of files to cache, this time just the CSS file.
CACHE:
style/default.css

The following manifest defines a catch-all error page that is displayed for any page on the
site while the user is offline. It also specifies that the online whitelist wildcard flag is open, meaning that accesses to resources on other sites will not be blocked.
(Resources on the same site are already not blocked because of the catch-all fallback
namespace.)

So long as all pages on the site reference this manifest, they will get cached locally as they
are fetched, so that subsequent hits to the same page will load the page immediately from the
cache. Until the manifest is changed, those pages will not be fetched from the server again. When
the manifest changes, then all the files will be redownloaded.

Subresources, such as style sheets, images, etc, would only be cached using the regular HTTP
caching semantics, however.

This is a willful violation of RFC 2046, which requires all text/* types to only allow CRLF line breaks. This requirement, however, is
outdated; the use of CR, LF, and CRLF line breaks is commonly supported and indeed sometimes CRLF
is not supported by text editors. [RFC2046]

The first line of an application cache manifest must consist of the string "CACHE", a single
U+0020 SPACE character, the string "MANIFEST", and either a U+0020 SPACE character, a "tab" (U+0009) character, a "LF" (U+000A) character, or a "CR" (U+000D) character. The first line may optionally be preceded by a "BOM" (U+FEFF)
character. If any other text is found on the first line, it is ignored.

Subsequent lines, if any, must all be one of the following:

A blank line

Blank lines must consist of zero or more U+0020 SPACE and
"tab" (U+0009) characters only.

A comment

Comment lines must consist of zero or more U+0020 SPACE and "tab" (U+0009)
characters, followed by a single "#" (U+0023) character, followed by zero or more
characters other than "LF" (U+000A) and "CR" (U+000D) characters.

Comments must be on a line on their own. If they were to be included on a line
with a URL, the "#" would be mistaken for part of a fragment identifier.

A section header

Section headers change the current section. There are four possible section headers:

CACHE:

Switches to the explicit section.

FALLBACK:

Switches to the fallback section.

NETWORK:

Switches to the online whitelist section.

SETTINGS:

Switches to the settings section.

Section header lines must consist of zero or more U+0020 SPACE and "tab" (U+0009) characters, followed by one of the names above (including the ":)" (U+003A) character followed by zero or more U+0020 SPACE and "tab" (U+0009)
characters.

When the current section is the explicit
section, data lines must consist of zero or more U+0020 SPACE and "tab" (U+0009) characters, a valid URL identifying a resource other than the
manifest itself, and then zero or more U+0020 SPACE and "tab" (U+0009)
characters.

When the current section is the fallback
section, data lines must consist of zero or more U+0020 SPACE and "tab" (U+0009) characters, a valid URL identifying a resource other than the
manifest itself, one or more U+0020 SPACE and "tab" (U+0009) characters,
another valid URL identifying a resource other than the manifest itself, and then
zero or more U+0020 SPACE and "tab" (U+0009) characters.

When the current section is the online
whitelist section, data lines must consist of zero or more U+0020 SPACE and "tab" (U+0009) characters, either a single "*" (U+002A) character or a valid URL identifying a resource
other than the manifest itself, and then zero or more U+0020 SPACE and "tab" (U+0009) characters.

When the current section is the settings
section, data lines must consist of zero or more U+0020 SPACE and "tab" (U+0009) characters, a setting,
and then zero or more U+0020 SPACE and "tab" (U+0009) characters.

URLs that are to be fallback pages associated with fallback namespaces, and those namespaces themselves,
must be given in fallback sections, with
the namespace being the first URL of the data line, and the corresponding fallback page being the
second URL. All the other pages to be cached must be listed in explicit sections.

Namespaces that the user agent is to put into the online whitelist must all be specified in online whitelist sections. (This is needed for
any URL that the page is intending to use to communicate back to the server.) To specify that all
URLs are automatically whitelisted in this way, a "*" (U+002A) character may be specified
as one of the URLs.

Relative URLs must be given relative to the manifest's own
URL. All URLs in the manifest must have the same scheme as
the manifest itself (either explicitly or implicitly, through the use of relative URLs). [URL]

URLs in manifests must not have fragment identifiers (i.e. the U+0023 NUMBER SIGN character
isn't allowed in URLs in manifests).

Let position be a pointer into input, initially
pointing at the first character.

If the characters starting from position are "CACHE", followed by a
U+0020 SPACE character, followed by "MANIFEST", then advance position to the
next character after those. Otherwise, this isn't a cache manifest; abort this algorithm with a
failure while checking for the magic signature.

If the character at position is neither a U+0020 SPACE character, a
"tab" (U+0009) character, "LF" (U+000A) character, nor a "CR" (U+000D) character, then this isn't a cache manifest; abort this algorithm with a
failure while checking for the magic signature.

This is a cache manifest. The algorithm cannot fail beyond
this point (though bogus lines can get ignored).

Collect a sequence of characters that are not "LF" (U+000A)
or "CR" (U+000D) characters, and ignore those characters. (Extra text on the first
line, after the signature, is ignored.)

Let mode be "explicit".

Start of line: If position is past the end of input, then jump to the last step. Otherwise, collect a sequence of
characters that are "LF" (U+000A), "CR" (U+000D), U+0020 SPACE, or
"tab" (U+0009) characters.

Drop any trailing U+0020 SPACE and "tab" (U+0009) characters at the end
of line.

If line is the empty string, then jump back to the step labeled start
of line.

If the first character in line is a "#" (U+0023) character,
then jump back to the step labeled start of line.

If line equals "CACHE:" (the word "CACHE" followed by a ":)" (U+003A) character, then set mode to "explicit" and jump back to the step labeled
start of line.

If line equals "FALLBACK:" (the word "FALLBACK" followed by a ":)" (U+003A) character, then set mode to "fallback" and jump back to the step
labeled start of line.

If line equals "NETWORK:" (the word "NETWORK" followed by a ":)" (U+003A) character, then set mode to "online whitelist" and jump back to
the step labeled start of line.

If line equals "SETTINGS:" (the word "SETTINGS" followed by a ":)" (U+003A) character, then set mode to "settings" and jump back to the step
labeled start of line.

If line ends with a ":" (U+003A) character, then set mode to "unknown" and jump back to the step labeled start of line.

This is either a data line or it is syntactically incorrect.

Let position be a pointer into line, initially
pointing at the start of the string.

Let tokens be a list of strings, initially empty.

While position doesn't point past the end of line:

Let current token be an empty string.

While position doesn't point past the end of line and the character at position is neither a U+0020 SPACE
nor a "tab" (U+0009) character, add the character at position to current token and advance position to the next character in input.

Add current token to the tokens list.

While position doesn't point past the end of line and the character at position is either a U+0020 SPACE
or a "tab" (U+0009) character, advance position to the
next character in input.

Process tokens as follows:

If mode is "explicit"

Resolve the first item in tokens,
relative to base URL, with the URL character encoding set to UTF-8;
ignore the rest.

If this fails, then jump back to the step labeled start of line.

If the resulting parsed URL has a different scheme component than base URL (the
manifest's URL), then jump back to the step labeled start of line.

Let new URL be the result of applying the URL serializer algorithm to the resulting parsed
URL, with the exclude fragment flag set.

Add new URL to the explicit URLs.

If mode is "fallback"

Let part one be the first token in tokens, and let
part two be the second token in tokens.

Resolvepart one and part two, relative to base URL, with the URL character
encoding set to UTF-8.

If either fails, then jump back to the step labeled start of line.

If the absolute URL corresponding to either part one or
part two does not have the same origin as the manifest's URL,
then jump back to the step labeled start of line.

Let part one be the result of applying the URL serializer algorithm to the first resulting
parsed URL, with the exclude fragment flag set.

Let part two be the result of applying the URL serializer algorithm to the second resulting
parsed URL, with the exclude fragment flag set.

If part one is already in the fallback URLs mapping
as a fallback namespace, then jump back to
the step labeled start of line.

The resource that declares the manifest (with the manifest attribute) will always get taken from the cache,
whether it is listed in the cache or not, even if it is listed in an online whitelist namespace.

5.7.4 Downloading or updating an application cache

When the user agent is required (by other parts of this specification) to start the
application cache download process for an absolute URL purported to
identify a manifest, or for an application
cache group, potentially given a particular cache host, and potentially given
a master resource, the user agent must run the steps
below. These steps are always run asynchronously, in parallel with the event looptasks.

Some of these steps have requirements that only apply if the user agent shows caching
progress. Support for this is optional. Caching progress UI could consist of a progress bar
or message panel in the user agent's interface, or an overlay, or something else. Certain events
fired during the application cache download process allow the script to override the
display of such an interface. (Such events are delayed until after the load event has fired.)
The goal of this is to allow Web applications to provide more
seamless update mechanisms, hiding from the user the mechanics of the application cache mechanism.
User agents may display user interfaces independent of this, but are encouraged to not show
prominent update progress notifications for applications that cancel the relevant events.

Optionally, wait until the permission to start the application cache download
process has been obtained from the user and until the user agent is confident that the
network is available. This could include doing nothing until the user explicitly opts-in to
caching the site, or could involve prompting the user for permission. The algorithm might never
get past this point. (This step is particularly intended to be used by user agents running on
severely space-constrained devices or in highly privacy-sensitive environments).

Atomically, so as to avoid race conditions, perform the following substeps:

The MIME type of the resource is ignored — it is assumed to
be text/cache-manifest. In the future, if new manifest formats are supported, the
different types will probably be distinguished on the basis of the file signatures (for the
current format, that is the "CACHE MANIFEST" string at the top of the
file).

If fetching the manifest fails due to a 404 or 410 response or equivalent, then run these substeps:

Mark cache group as obsolete. This cache group no
longer exists for any purpose other than the processing of Document objects
already associated with an application cache in the cache
group.

Otherwise, if fetching the manifest fails in some other way (e.g. the server returns
another 4xx or 5xx response or equivalent, or
there is a DNS error, or the connection times out, or the user cancels the download, or the
parser for manifests fails when checking the magic signature), or if the server returned a
redirect, then run the cache failure steps. [HTTP]

If the download failed (e.g. the server returns a 4xx or 5xx response or equivalent, or there is a DNS error, the
connection times out, or the user cancels the download), or if the resource is labeled with
the "no-store" cache directive, then create a task to
fire a simple event that is cancelable named error at the ApplicationCache singleton of
the Document for this entry, if there still is one, and append it to task list. The default action of this event must be, if the user agent
shows caching progress, the display of some sort of user interface indicating to
the user that the user agent failed to save the application for offline use.

Otherwise, associate the Document for this entry with cache; store the resource for this entry in cache, if it
isn't already there, and categorize its entry as a master entry. If applying the URL parser
algorithm to the resource's URL results in a parsed URL that has a
non-null fragment component, the URL
used for the entry in cache must instead be the absolute URL
obtained from applying the URL serializer
algorithm to the parsed URL with the exclude fragment flag set
(application caches never include fragment identifiers).

If the resource URL being processed was flagged as neither an "explicit entry" nor or a
"fallback entry", then the user agent may skip this URL.

This is intended to allow user agents to expire resources not listed in the
manifest from the cache. Generally, implementors are urged to use an approach that expires
lesser-used resources first.

For each cache host associated with an application cache in
cache group, queue a post-load task to fire a trusted
event with the name progress, which does not
bubble, which is cancelable, and which uses the ProgressEvent interface, at the
ApplicationCache singleton of the cache host. The lengthComputable attribute must be set to
true, the total attribute must be set to the
number of files in file list, and the loaded attribute must be set to the number of files in
file list that have been either downloaded or skipped so far. The default
action of these events must be, if the user agent shows caching progress, the
display of some sort of user interface indicating to the user that a file is being downloaded
in preparation for updating the application. [PROGRESS-EVENTS]

Fetch the resource, from the origin of the
URLmanifest URL, with the synchronous flag set and
the manual redirect flag set. If this is an upgrade attempt, then use the newestapplication cache in cache group as an HTTP cache, and honor HTTP caching semantics (such as
expiration, ETags, and so forth) with respect to that cache. User agents may also have other
caches in place that are also honored.

If the resource in question is already being downloaded for other reasons then
the existing download process can sometimes be used for the purposes of this step, as defined
by the fetching algorithm.

An example of a resource that might already be being downloaded is a large
image on a Web page that is being seen for the first time. The image would get downloaded to
satisfy the img element on the page, as well as being listed in the cache
manifest. According to the rules for fetching that image only need
be downloaded once, and it can be used both for the cache and for the rendered Web page.

If the previous step fails (e.g. the server returns a 4xx or 5xx response or equivalent, or there is a DNS error, or the
connection times out, or the user cancels the download), or if the server returned a redirect,
or if the resource is labeled with the "no-store" cache directive, then run the first
appropriate step from the following list: [HTTP]

If the URL being processed was flagged as an "explicit entry" or a "fallback entry"

If these steps are being run in parallel for any other URLs in file
list, then abort these steps for those other URLs. Run the cache failure
steps.

Redirects are fatal because they are either indicative of a network problem
(e.g. a captive portal); or would allow resources to be added to the cache under URLs that
differ from any URL that the networking model will allow access to, leaving orphan entries;
or would allow resources to be stored under URLs different than their true URLs. All of
these situations are bad.

Copy the resource and its metadata from the newestapplication cache in cache group whose completeness
flag is complete, and act as if that was the fetched resource, ignoring the
resource obtained from the network.

User agents may warn the user of these errors as an aid to development.

These rules make errors for resources listed in the manifest fatal, while
making it possible for other resources to be removed from caches when they are removed from
the server, without errors, and making non-manifest resources survive server-side errors.

Except for the "no-store" directive, HTTP caching rules that would cause a
file to be expired or otherwise not cached are ignored for the purposes of the
application cache download process.

Otherwise, the fetching succeeded. Store the resource in the new
cache.

If the user agent is not able to store the resource (e.g. because of quota restrictions),
the user agent may prompt the user or try to resolve the problem in some other manner (e.g.
automatically pruning content in other caches). If the problem cannot be resolved, the user
agent must run the cache failure steps.

If the URL being processed was flagged as an "explicit entry" in file
list, then categorize the entry as an explicit
entry.

If the URL being processed was flagged as a "fallback entry" in file
list, then categorize the entry as a fallback
entry.

If the URL being processed was flagged as an "master entry" in file
list, then categorize the entry as a master
entry.

As an optimization, if the resource is an HTML or XML file whose root element is an
html element with a manifest attribute
whose value doesn't match the manifest URL of the application cache being processed, then the
user agent should mark the entry as being foreign.

If the download failed (e.g. the server returns a 4xx or 5xx response or equivalent, or there is a DNS error, the
connection times out, or the user cancels the download), or if the resource is labeled with the
"no-store" cache directive, then run these substeps:

Otherwise, store the resource for this entry in new cache, if it isn't
already there, and categorize its entry as a master
entry.

Fetch the resource from manifest URL again, with
the synchronous flag set, and let second manifest be that resource.
HTTP caching semantics should again be honored for this request.

Since caching can be honored, authors are encouraged to avoid setting the cache
headers on the manifest in such a way that the user agent would simply not contact the network
for this second request; otherwise, the user agent would not notice if the cache had changed
during the cache update process.

If the previous step failed for any reason, or if the fetching attempt involved a redirect,
or if second manifest and manifest are not byte-for-byte
identical, then schedule a rerun of the entire algorithm with the same parameters after a short
delay, and run the cache failure steps.

Otherwise, store manifest in new cache, if it's not
there already, and categorize its entry as the
manifest.

Create a task to fire a simple event that
is cancelable named error at the
ApplicationCache singleton of the Document for this entry, if there
still is one, and append it to task list. The default action of these
events must be, if the user agent shows caching progress, the display of some sort
of user interface indicating to the user that the user agent failed to save the application for
offline use.

Each Document has a list of pending application cache download process
tasks that is used to delay events fired by the algorithm above until the document's load event has fired. When the Document is created, the
list must be empty.

When the steps above say to queue a post-load tasktask, where
task is a task that dispatches an event on a
target ApplicationCache object target, the user agent must run
the appropriate steps from the following list:

5.7.5 The application cache selection algorithm

When the application cache selection algorithm
algorithm is invoked with a Documentdocument and optionally a
manifest URLmanifest URL, the user agent must run the first
applicable set of steps from the following list:

If there are relevant application caches that
are identified by a URL with the same origin as the URL of document, and that have this URL as one of their entries, excluding entries
marked as foreign, then the user agent should use
the most appropriate application cache of those
that match as an HTTP cache for any subresource loads. User agents may also have other caches in
place that are also honored.

Fetch the resource normally. If this results in a redirect to a resource with
another origin (indicative of a captive portal), or a 4xx or 5xx status code or equivalent, or if there were network errors (but
not if the user canceled the download), then instead get, from the cache, the resource of the
fallback entry corresponding to the fallback namespacef. Abort
these steps.

The above algorithm ensures that so long as the online whitelist wildcard flag is blocking, resources that are not present in the manifest will always fail to load (at least, after the
application cache has been primed the first time), making the testing of offline
applications simpler.

5.7.7 Expiring application caches

As a general rule, user agents should not expire application caches, except on request from the
user, or after having been left unused for an extended period of time.

Application caches and cookies have similar implications with respect to privacy (e.g. if the
site can identify the user when providing the cache, it can store data in the cache that can be
used for cookie resurrection). Implementors are therefore encouraged to expose application caches
in a manner related to HTTP cookies, allowing caches to be expunged together with cookies and
other origin-specific data.

For example, a user agent could have a "delete site-specific data" feature that
clears all cookies, application caches, local storage, databases, etc, from an origin all at
once.

5.7.8 Disk space

User agents should consider applying constraints on disk usage of application caches, and care should be taken to ensure that the restrictions cannot
be easily worked around using subdomains.

User agents should allow users to see how much space each domain is using, and may offer the
user the ability to delete specific application caches.

For predictability, quotas should be based on the uncompressed size of data stored.

How quotas are presented to the user is not defined by this specification. User
agents are encouraged to provide features such as allowing a user to indicate that certain sites
are trusted to use more than the default quota, e.g. by asynchronously presenting a user interface
while a cache is being updated, or by having an explicit whitelist in the user agent's
configuration interface.

Calling this method is not usually necessary, as user agents will generally take care of
updating application caches automatically.

The method can be useful in situations such as long-lived applications. For example, a Web
mail application might stay open in a browser tab for weeks at a time. Such an application could
want to test for updates each day.

Switches to the most recent application cache, if there is a newer one. If there isn't,
throws an InvalidStateError exception.

This does not cause previously-loaded resources to be reloaded; for example, images do not
suddenly get reloaded and style sheets and scripts do not get reparsed or reevaluated. The only
change is that subsequent requests for cached resources will obtain the newer copies.

The updateready event will fire before this
method can be called. Once it fires, the Web application can, at its leisure, call this method
to switch the underlying cache to the one with the more recent updates. To make proper use of
this, applications have to be able to bring the new features into play; for example, reloading
scripts to enable new features.

The status attribute, on getting, must
return the current state of the application cache that the
ApplicationCache object's cache host is associated with, if any. This
must be the appropriate value from the following list:

5.7.10 Browser state

Returns false if the user agent is definitely offline (disconnected from the network).
Returns true if the user agent might be online.

The events online and offline are fired when the value of this attribute changes.

The navigator.onLine attribute must return
false if the user agent will not contact the network when the user follows links or when a script
requests a remote page (or knows that such an attempt would fail), and must return true
otherwise.