5.6 Offline Web applications

Status: Last call for comments

5.6.1 Introduction

Status: Last call for comments

This section is non-normative.

In order to enable users to continue interacting with Web
applications and documents even when their network connection is
unavailable — for instance, because they are traveling outside
of their ISP's coverage area — authors can provide a manifest
which lists the files that are needed for the Web application to
work offline and which causes the user's browser to keep a copy of
the files for use offline.

Now, if the user goes to the page, the browser will cache the
files and make them available even when the user is offline.

Authors are encouraged to include the main page in
the manifest also, but in practice the page that referenced the
manifest is automatically cached even if it isn't explicitly
mentioned.

HTTP cache headers and restrictions on caching pages
served over TLS (encrypted, using https:) are
overridden by manifests. Thus, pages will not expire from an
application cache before the user agent has updated it, and even
applications served over TLS can be made to work offline.

5.6.1.1 Event summary

Status: Last call for comments

This section is non-normative.

When the user visits a page that declares a manifest, the browser
will try to update the cache. It does this by fetching a copy of the
manifest and, if the manifest has changed since the user agent last
saw it, redownloading all the resources it mentions and caching them
anew.

As this is going on, a number of events get fired on the
ApplicationCache object to keep the script updated as
to the state of the cache update, so that the user can be notified
appropriately. The events are as follows:

Resources that were listed in the cache's manifest in an explicit
section. Explicit entries can also be marked as foreign, which means that
they have a manifest
attribute but that it doesn't point at this cache's manifest.

A URL in the list can be flagged with multiple
different types, and thus an entry can end up being categorized as
multiple entries. For example, an entry can be a manifest entry
and an explicit entry at the same time, if the manifest is listed
within the manifest.

Multiple application
caches in different application cache groups can contain the same
resource, e.g. if the manifests all reference that resource. If the
user agent is to select an
application cache from a list of relevant application caches that contain a
resource, the user agent must use the application cache that the
user most likely wants to see the resource from, taking into account
the following:

which application cache was most recently updated,

which application cache was being used to display the
resource from which the user decided to look at the new resource,
and

which application cache the user prefers.

A URL matches a
fallback namespace if there exists a relevant
application cache whose manifest's URL has the
same origin as the URL in question, and that has a
fallback namespace
that is a prefix match for the URL being examined. If
multiple fallback namespaces match the same URL, the longest one is
the one that matches. A URL looking for a fallback namespace can
match more than one application cache at a time, but only matches
one namespace in each cache.

If a manifest http://example.com/app1/manifest declares that
http://example.com/resources/images is a
fallback namespace, and the user navigates to HTTP://EXAMPLE.COM:80/resources/images/cat.png,
then the user agent will decide that the application cache
identified by http://example.com/app1/manifest contains a
namespace with a match for that URL.

5.6.3 The cache manifest syntax

Status: Last call for comments

5.6.3.1 A sample manifest

Status: Last call for comments

This section is non-normative.

This example manifest requires two images and a style sheet to be
cached and whitelists a CGI script.

CACHE MANIFEST
# the above line is required
# this is a comment
# there can be as many of these anywhere in the file
# they are all ignored
# comments can have spaces before them
# but must be alone on the line
# blank lines are ignored too
# these are files that need to be cached they can either be listed
# first, or a "CACHE:" header could be put before them, as is done
# lower down.
images/sound-icon.png
images/background.png
# note that each file has to be put on its own line
# here is a file for the online whitelist -- it isn't cached, and
# references to this file will bypass the cache, always hitting the
# network (or trying to, if the user is offline).
NETWORK:
comm.cgi
# here is another set of files to cache, this time just the CSS file.
CACHE:
style/default.css

The following manifest defines a catch-all error page that is
displayed for any page on the site while the user is offline. It
also specifies that the online whitelist
wildcard flag is open, meaning that accesses
to resources on other sites will not be blocked. (Resources on the
same site are already not blocked because of the catch-all fallback
namespace.)

So long as all pages on the site reference this manifest, they
will get cached locally as they are fetched, so that subsequent hits
to the same page will load the page immediately from the
cache. Until the manifest is changed, those pages will not be
fetched from the server again. When the manifest changes, then all
the files will be redownloaded.

Subresources, such as style sheets, images, etc, would only be
cached using the regular HTTP caching semantics, however.

This is a willful violation of two
aspects of RFC 2046, which requires all text/*
types to support an open-ended set of character encodings and only
allows CRLF line breaks. These requirements, however, are outdated;
UTF-8 is now widely used, such that supporting other encodings is no
longer necessary, and use of CR, LF, and CRLF line breaks is
commonly supported and indeed sometimes CRLF is not
supported by text editors. [RFC2046]

The first line of an application cache manifest must consist of
the string "CACHE", a single U+0020 SPACE character, the string
"MANIFEST", and either a U+0020 SPACE character, a U+0009 CHARACTER
TABULATION (tab) character, a U+000A LINE FEED (LF) character, or a
U+000D CARRIAGE RETURN (CR) character. The first line may optionally
be preceded by a U+FEFF BYTE ORDER MARK (BOM) character. If any
other text is found on the first line, it is ignored.

Subsequent lines, if any, must all be one of the following:

A blank line

Blank lines must consist of zero or more U+0020 SPACE and
U+0009 CHARACTER TABULATION (tab) characters only.

A comment

Comment lines must consist of zero or more U+0020 SPACE and
U+0009 CHARACTER TABULATION (tab) characters, followed by a single
U+0023 NUMBER SIGN character (#), followed by zero or more
characters other than U+000A LINE FEED (LF) and U+000D CARRIAGE
RETURN (CR) characters.

Comments must be on a line on their own. If they
were to be included on a line with a URL, the "#" would be
mistaken for part of a fragment identifier.

A section header

Section headers change the current section. There are three
possible section headers:

CACHE:

Switches to the explicit section.

FALLBACK:

Switches to the fallback section.

NETWORK:

Switches to the online whitelist section.

Section header lines must consist of zero or more U+0020 SPACE
and U+0009 CHARACTER TABULATION (tab) characters, followed by one
of the names above (including the U+003A COLON character (:))
followed by zero or more U+0020 SPACE and U+0009 CHARACTER
TABULATION (tab) characters.

When the current section is the explicit
section, data lines must consist of zero or more U+0020
SPACE and U+0009 CHARACTER TABULATION (tab) characters, a
valid URL identifying a resource other than the
manifest itself, and then zero or more U+0020 SPACE and U+0009
CHARACTER TABULATION (tab) characters.

When the current section is the fallback
section, data lines must consist of zero or more U+0020
SPACE and U+0009 CHARACTER TABULATION (tab) characters, a
valid URL identifying a resource other than the
manifest itself, one or more U+0020 SPACE and U+0009 CHARACTER
TABULATION (tab) characters, another valid URL
identifying a resource other than the manifest itself, and then
zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab)
characters.

When the current section is the online whitelist
section, data lines must consist of zero or more U+0020
SPACE and U+0009 CHARACTER TABULATION (tab) characters, either a
single U+002A ASTERISK character (*) or a valid
URL identifying a resource other than the manifest itself,
and then zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION
(tab) characters.

URLs that are to be fallback pages associated with fallback namespaces, and
those namespaces themselves, must be given in fallback sections,
with the namespace being the first URL of the data line, and the
corresponding fallback page being the second URL. All the other
pages to be cached must be listed in explicit
sections.

Namespaces that the user agent is to put into the online whitelist
must all be specified in online whitelist
sections. (This is needed for any URL that the page is
intending to use to communicate back to the server.) To specify that
all URLs are automatically whitelisted in this way, a U+002A
ASTERISK character character (*) may be specified as one of the
URLs.

Relative URLs must be given relative to the manifest's own
URL. All URLs in the manifest must have the same <scheme> as the manifest itself
(either explicitly or implicitly, through the use of relative
URLs).

URLs in manifests must not have fragment identifiers (i.e. the
U+0023 NUMBER SIGN character isn't allowed in URLs in
manifests).

5.6.3.3 Parsing cache manifests

Status: Last call for comments

When a user agent is to parse a manifest, it means
that the user agent must run the following steps:

The user agent must decode the byte stream corresponding with
the manifest to be parsed, treating it as UTF-8. Bytes or sequences
of bytes that are not valid UTF-8 sequences must be interpreted as
a U+FFFD REPLACEMENT CHARACTER.

Let position be a pointer into input, initially pointing at the first
character.

If position is pointing at a U+FEFF BYTE
ORDER MARK (BOM) character, then advance position to the next character.

If the characters starting from position
are "CACHE", followed by a U+0020 SPACE character, followed by
"MANIFEST", then advance position to the next
character after those. Otherwise, this isn't a cache manifest;
abort this algorithm with a failure while checking for the magic
signature.

If the character at position is neither
a U+0020 SPACE character, a U+0009 CHARACTER TABULATION (tab)
character, U+000A LINE FEED (LF) character, nor a U+000D CARRIAGE
RETURN (CR) character, then this isn't a cache manifest; abort this
algorithm with a failure while checking for the magic
signature.

This is a cache manifest. The algorithm cannot fail beyond
this point (though bogus lines can get ignored).

Collect a sequence of characters that are
not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)
characters, and ignore those characters. (Extra text on the first
line, after the signature, is ignored.)

Let mode be "explicit".

Start of line: If position is
past the end of input, then jump to the last
step. Otherwise, collect a sequence of characters that
are U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), U+0020
SPACE, or U+0009 CHARACTER TABULATION (tab) characters.

Drop any trailing U+0020 SPACE and U+0009 CHARACTER
TABULATION (tab) characters at the end of line.

If line is the empty string, then jump
back to the step labeled "start of line".

If the first character in line is a
U+0023 NUMBER SIGN character (#), then jump back to the step
labeled "start of line".

If line equals "CACHE:" (the word
"CACHE" followed by a U+003A COLON character (:)), then set mode to "explicit" and jump back to the step
labeled "start of line".

If line equals "FALLBACK:" (the word
"FALLBACK" followed by a U+003A COLON character (:)), then set mode to "fallback" and jump back to the step
labeled "start of line".

If line equals "NETWORK:" (the word
"NETWORK" followed by a U+003A COLON character (:)), then set mode to "online whitelist" and jump back to the step
labeled "start of line".

If line ends with a U+003A COLON
character (:), then set mode to "unknown" and
jump back to the step labeled "start of line".

This is either a data line or it is syntactically
incorrect.

Let position be a pointer into line, initially pointing at the start of the
string.

Let tokens be a list of strings,
initially empty.

While position doesn't point past the end
of line:

Let current token be an empty
string.

While position doesn't point past the
end of line and the character at position is neither a U+0020 SPACE nor a U+0009
CHARACTER TABULATION (tab) character, add the character at position to current token and
advance position to the next character in
input.

Add current token to the tokens list.

While position doesn't point past the
end of line and the character at position is either a U+0020 SPACE or a U+0009
CHARACTER TABULATION (tab) character, advance position to the next character in input.

Process tokens as follows:

If mode is "explicit"

Resolve the first item in
tokens, relative to base
URL; ignore the rest.

If this fails, then jump back to the step labeled "start of
line".

If the resulting absolute URL has a different
<scheme> component than
the manifest's URL (compared in an ASCII
case-insensitive manner), then jump back to the step
labeled "start of line". If the manifest's <scheme> is https: or another scheme intended for encrypted
data transfer, and the resulting absolute URL does
not have the same origin as the manifest's URL,
then jump back to the step labeled "start of line".

5.6.4 Downloading or updating an application cache

Status: Last call for comments

When the user agent is required (by other parts of this
specification) to start the application cache download
process for an absolute URL purported to identify
a manifest, or for an
application cache group, potentially given a particular
cache host, and potentially given a master resource, the user
agent must run the steps below. These steps are always run
asynchronously, in parallel with the event looptasks.

Some of these steps have requirements that only apply if the user
agent shows caching progress. Support for this is
optional. Caching progress UI could consist of a progress bar or
message panel in the user agent's interface, or an overlay, or
something else. Certain events fired during the application
cache download process allow the script to override the display
of such an interface. The goal of this is to allow Web applications
to provide more seamless update mechanisms, hiding from the user the
mechanics of the application cache mechanism. User agents may
display user interfaces independent of this, but are encouraged to
not show prominent update progress notifications for applications
that cancel the relevant events.

Optionally, wait until the permission to start the
application cache download process has been obtained
from the user and until the user agent is confident that the
network is available. This could include doing nothing until the
user explicitly opts-in to caching the site, or could involve
prompting the user for permission. The algorithm might never get
past this point. (This step is particularly intended to be used by
user agents running on severely space-constrained devices or in
highly privacy-sensitive environments).

Atomically, so as to avoid race conditions, perform the
following substeps:

Otherwise, if fetching the manifest fails in some other
way (e.g. the server returns another 4xx or 5xx response or equivalent, or
there is a DNS error, or the connection times out, or the user
cancels the download, or the parser for manifests fails when
checking the magic signature), or if the server returned a
redirect, or if the resource is labeled with a MIME
type other than text/cache-manifest, then run
the cache failure steps.

If the download failed (e.g. the connection times out, or the
user cancels the download), then create a task to fire a simple
event that is cancelable named error at the
ApplicationCache singleton of the cache
host the Document for this entry, if there
still is one, and add it to task list. The
default action of this event must be, if the user agent
shows caching progress, the display of some sort of
user interface indicating to the user that the user agent failed
to save the application for offline use.

Otherwise, associate the Document for this entry
with cache; store the resource for this
entry in cache, if it isn't already there,
and categorize its entry as a master entry. If the
resource's URL has a <fragment> component, it must
be removed from the entry in cache
(application caches never include fragment identifiers).

If any URL is in file list more than
once, then merge the entries into one entry for that URL, that
entry having all the flags that the original entries had.

For each URL in file list, run the
following steps. These steps may be run in parallel for two or
more of the URLs at a time.

If the resource URL being processed was flagged as neither an
"explicit entry" nor or a "fallback entry", then the user agent
may skip this URL.

This is intended to allow user agents to expire
resources not listed in the manifest from the cache. Generally,
implementors are urged to use an approach that expires
lesser-used resources first.

For each cache host associated with an
application cache in cache
group, queue a post-load task to fire an event
with the name progress, which does not
bubble, which is cancelable, and which uses the
ProgressEvent interface, at the
ApplicationCache singleton of the cache
host. The lengthComputable
attribute must be set to true, the total attribute must be
set to the number of files in file list, and
the loaded
attribute must be set to the number of number of files in file list that have been either downloaded or
skipped so far. The default action of these events must be, if
the user agent shows caching progress, the display
of some sort of user interface indicating to the user that a file
is being downloaded in preparation for updating the
application. [PROGRESS]

Fetch the resource, from the origin
of the URLmanifest URL. If
this is an upgrade
attempt, then use the newestapplication
cache in cache group as an HTTP
cache, and honor HTTP caching semantics (such as expiration,
ETags, and so forth) with respect to that cache. User agents may
also have other caches in place that are also honored.

If the resource in question is already being
downloaded for other reasons then the existing download process
can sometimes be used for the purposes of this step, as defined
by the fetching algorithm.

An example of a resource that might already
be being downloaded is a large image on a Web page that is being
seen for the first time. The image would get downloaded to
satisfy the img element on the page, as well as
being listed in the cache manifest. According to the rules for
fetching that image only need be
downloaded once, and it can be used both for the cache and for
the rendered Web page.

If the previous step fails (e.g. the server returns a 4xx or
5xx response or
equivalent, or there is a DNS error, or the connection
times out, or the user cancels the download), or if the server
returned a redirect, then run the first appropriate step from
the following list:

If the URL being processed was flagged as an "explicit
entry" or a "fallback entry"

Redirects are fatal because they are either
indicative of a network problem (e.g. a captive portal); or
would allow resources to be added to the cache under URLs that
differ from any URL that the networking model will allow
access to, leaving orphan entries; or would allow resources to
be stored under URLs different than their true URLs. All of
these situations are bad.

Copy the resource and its metadata from the newestapplication
cache in cache group whose completeness flag
is complete, and act as if that was the fetched
resource, ignoring the resource obtained from the network.

User agents may warn the user of these errors as an aid to
development.

These rules make errors for resources listed in
the manifest fatal, while making it possible for other resources
to be removed from caches when they are removed from the server,
without errors, and making non-manifest resources survive
server-side errors.

Otherwise, the fetching succeeded. Store the resource in
the new cache.

If the URL being processed was flagged as an "explicit
entry" in file list, then categorize the
entry as an explicit
entry.

If the URL being processed was flagged as a "fallback
entry" in file list, then categorize the
entry as a fallback
entry.

If the URL being processed was flagged as an "master
entry" in file list, then categorize the
entry as a master
entry.

As an optimization, if the resource is an HTML or XML file
whose root element is an html element with a manifest attribute whose value
doesn't match the manifest URL of the application cache being
processed, then the user agent should mark the entry as being
foreign.

For each cache host associated with an
application cache in cache group,
queue a post-load task to fire an event with the name
progress, which does
not bubble, which is cancelable, and which uses the
ProgressEvent interface, at the
ApplicationCache singleton of the cache
host. The lengthComputable
attribute must be set to true, the total and the loaded attributes must be
set to the number of number of files in file
list. The default action of these events must be, if the user
agent shows caching progress, the display of some sort
of user interface indicating to the user that all the files have
been downloaded. [PROGRESS]

Otherwise, store the resource for this entry in new cache, if it isn't already there, and
categorize its entry as a master entry.

Fetch the resource from manifest
URL again, and let second manifest be
that resource.

If the previous step failed for any reason, or if the fetching
attempt involved a redirect, or if second
manifest and manifest are not
byte-for-byte identical, then schedule a rerun of the entire
algorithm with the same parameters after a short delay, and run
the cache failure steps.

Otherwise, store manifest in new cache, if it's not there already, and
categorize its entry as the manifest.

Create a task to
fire a simple event that is cancelable named error at the
ApplicationCache singleton of the
Document for this entry, if there still is one, and
add it to task list. The default action of
these events must be, if the user agent shows caching
progress, the display of some sort of user interface
indicating to the user that the user agent failed to save the
application for offline use.

Each Document has a list of pending application
cache download process tasks that is used to delay events
fired by the algorithm above until the document's load event has fired. When the
Document is created, the list must be empty.

When the steps above say to queue a post-load tasktask, where task is a task that dispatches an event on a
target ApplicationCache object target, the user agent must run the appropriate steps
from the following list:

5.6.5 The application cache selection algorithm

Status: Last call for comments

When the application cache
selection algorithm algorithm is invoked with a
Documentdocument and optionally a
manifest URLmanifest URL, the user
agent must run the first applicable set of steps from the following
list:

Fetch the resource normally. If this results in a
redirect to a resource with another origin
(indicative of a captive portal), or a 4xx or 5xx status code
or equivalent,
or if there were network errors (but not if the user canceled the
download), then instead get, from the cache, the resource of the
fallback entry
corresponding to the fallback namespacef. Abort these steps.

The above algorithm ensures that so long as the
online
whitelist wildcard flag is blocking,
resources that are not present in the manifest will always fail
to load (at least, after the application cache has been
primed the first time), making the testing of offline applications
simpler.

5.6.7 Expiring application caches

Status: Last call for comments

As a general rule, user agents should not expire application
caches, except on request from the user, or after having been left
unused for an extended period of time.

Application caches and cookies have similar implications with
respect to privacy (e.g. if the site can identify the user when
providing the cache, it can store data in the cache that can be used
for cookie resurrection). Implementors are therefore encouraged to
expose application caches in a manner related to HTTP cookies,
allowing caches to be expunged together with cookies and other
origin-specific data.

For example, a user agent could have a "delete
site-specific data" feature that clears all cookies, application
caches, local storage, databases, etc, from an origin all at
once.

Switches to the most recent application cache, if there is a
newer one. If there isn't, throws an
INVALID_STATE_ERR exception.

This does not cause previously-loaded resources to be reloaded;
for example, images do not suddenly get reloaded and style sheets
and scripts do not get reparsed or reevaluated. The only change is
that subsequent requests for cached resources will obtain the
newer copies.

The status
attribute, on getting, must return the current state of the
application cache that the
ApplicationCache object's cache host is
associated with, if any. This must be the appropriate value from the
following list:

5.6.9 Browser state

Returns false if the user agent is definitely offline
(disconnected from the network). Returns true if the user agent
might be online.

The navigator.onLine
attribute must return false if the user agent will not contact the
network when the user follows links or when a script requests a
remote page (or knows that such an attempt would fail), and must
return true otherwise.