Abstract

This document defines APIs for off-line serving of requests to HTTP resources using
static and dynamic responses.

Status of this Document

This section describes the status of this document at the time of
its publication. Other documents may supersede this document. A list
of current W3C publications and the latest revision of this technical
report can be found in the W3C technical
reports index at http://www.w3.org/TR/.

This document is the 29 October 2009 First Public Working Draft of the
DataCache API specification.
If you wish to make comments regarding this document, please send them to
public-webapps@w3.org
(subscribe, archives)
with “[DataCache]” at the start of the subject line,
or submit them using our public bug database.

The latest stable version of the editor's draft of this specification is
always available on the W3C CVS server. Change tracking for this document is
available at the following location:

Publication as a Working Draft does not imply endorsement by the
W3C Membership. This is a draft document and may be updated, replaced
or obsoleted by other documents at any time. It is inappropriate to cite
this document as other than work in progress.

The standard HTTP caches built in to existing user agents
are under no obligation to locally store a cacheable resource
and do not provide any guarantees about off-line serving of
HTTP resources. An application cache [HTML5]
can hold static representations of a set of pre-defined resources
that can be served locally. However, applications cannot
alter this set of resources programmatically. Moreover, an
application cache cannot satisfy requests other than
GET and HEAD.

To address this limitation, this specification introduces data caches.
Instead of a static manifest resource listing the resources
to be cached, a data cache can be modified programmatically.
Web applications can add or remove resources in a data cache
which can then be statically served by the user agent when that resource
is requested. This specification also provides embedded local servers
to dynamically serve off-line representations of resources such as
in response to unsafe HTTP methods, e.g., POST.

Using data caches and embedded local servers, applications can obtain
locally cached or served data and complete requests to
resources whether or not the requests can be serviced immediately
by a remote server. Applications can replay locally satisfied requests
to the server thus enabling responsive and robust Web
applications in the presence of low connectivity conditions.

This specification does not introduce a new programming model
for Web applications as data caches and embedded local servers are
transparently pressed into action by the user agent, depending
on system conditions. This means that existing applications
can be used unchanged in environments that are not affected
by network unreliability. Applications can be altered to use APIs
specified in this document, only if they require improved
responsiveness. Such applications can seamlessly switch
between on-line and off-line operation without needing explicit
user action.

2. Conformance Requirements

Everything in this specification is normative except for diagrams,
examples, notes and sections marked as being informative.

A user agent must behave as described in this specification
in order to be considered conformant.

User agents may implement algorithms given in this
specification in any way desired, so long as the end result is
indistinguishable from the result that would be obtained by the
specification's algorithms.

A conforming DataCache user agent must also be a
conforming implementation of the IDL fragments
of this specification, as described in the
“Web IDL” specification. [WEBIDL]

Note

This specification uses both the terms "conforming user agent(s)"
and "user agent(s)" to refer to this product class.

The serve policy applies
only to resources that do not require interception, whereas the
intercept policy
can only be used for resources that require interception. Both policies
improve availability and responsiveness. However, both may affect
data freshness. The review policy
can only be used when the user agent is able to communicate with
the server. This policy improves data freshness at the cost of
reduced responsiveness. User agents may choose freely from among
these options, e.g., using information about the system condition,
network state or battery power level, or a user preference.

4.1.1. Examples

An application can use a data
cache to capture the
representation of a resource, i.e., cache an off-line
representation in the data cache.

Example

Using a data cache, an application captures a resource
as part of an atomic cache transaction. Once the resource is captured successfully,
the application places the captured representation in to service.

In this example, a local requst handler can produce a dynamic response
to an application request when the network is not accessible. The
response can be prepared using the data cache, for example. The
benefit of this technique is that applications don't need to alter
their applications to accommodate off-line use cases. Instead, user
agents can transparently introduce an application-specific off-line
handler to deal with situations when the network is not available.

Later when that application issues a GET request for
the cached resource either through page navigation or an XMLHttpRequest
[XMLHttpRequest], and the
user agent is off-line, it invokes the local off-line
handler which produces a dynamic response.

ECMAScript

var req = new XMLHttpRequest;
req.open('GET', uri);
...
req.send();

Applications can also locally intercept requests that modify data,
e.g., unsafe HTTP requests such as PUT in an off-line
handler. Requests to resources that are
managed in a
data cache can be
intercepted and served by an
interceptor, when the
user agent is off-line or by the server, when the user agent
is online with a reviewer
analyzing the result of the online request. A user agent can switch
between these two behaviors automatically based on system conditions.

Example

For example, an application worker script that wishes to allow off-line
updates to uri captures that resource in a data cache.

When the application makes a PUT request to
uri and the user agent is off-line,
the user agent asks the intercept function to
process the request and respond to it. If the user agent is online
when the request arrives, then it sends the request to
a host and asks the review function to process the
response received from it.

All the managed resources have been cached for serving off-line
requests to them, and the script can use
swapCache() to switch to the new cache.

(Last event in sequence.)

obsolete

The server returned a 401 error in response to a
request for fetching a managed resource's representation.

(Last event in sequence.)

error

A fatal error occurred while fetching a managed resource's representation.

(Last event in sequence.)

4.2. Data Caches

Data cache is a programmable
HTTP cache that can be manipulated by a Web application to serve off-line
representations of resources.

A data cache is a set of
managed resources
consisting of the entity (i.e., headers and body) for each
such resource that falls into one of the following categories:

Static entries

A resource whose representation is cached for off-line serving.
A request to a static entry
can only be used with the serve
policy.

Dynamic entries

A resource whose representation is produced locally
by an application upon request. Each
dynamic entry
identifies one or more dynamic protocol
methods for which a response can be obtained locally using an
interceptor.

Each data cache has a
completeness flag, which is
either incomplete or complete.

Multiple effective data caches may
contain the representation of a given resource. If the user agent is to
select a data cache from a list of
relevant data caches
that contain the representation of a required resource, then the user
agent must use the data cache
that the user most likely wants to see the representation from,
taking into account the following:

4.2.2.2. Capturing resources

When the user agent is required to add a
resource to be captured, given the URI of the resource to
capture, a cache host,
a cache transaction,
optionally a list of dynamic methods, optionally content to serve
as the static representation of the resource, and optionally
content type for the representation, the user agent must perform
the following steps:

Fetch the representation of the resource identified by the
absolute URI. Use the transaction's
data cache as an HTTP cache, and honor HTTP caching
semantics (such as expiration, ETags, and so forth) with
respect to that cache. User agents may also have other
caches in place that are also honored.

Note

If the resource in question is already being fetched
for other reasons, then the existing download process can
sometimes be used for the purposes of this step.

If the previous step fails (e.g. the server returns a 4xx
or 5xx response or equivalent, or there is a DNS error, or
the connection times out, or the user cancels the download),
or if the server returned a redirect, then run the
capture failure
steps.

Note

Redirects are fatal because they are either indicative of
a network problem (e.g. a captive portal); or would allow
resources to be added to the cache under URIs that differ
from any URI that the networking model will allow access
to, leaving orphan entries; or would allow resources to
be stored under URIs different than their true URIs. All
of these situations are bad.

Otherwise, the fetching succeeded. Store in cache a
managed resource
comprising the resulting absolute URI, its fetched
representation, and the list of dynamic methods passed to these
steps.

4.2.3. Expiring data caches

As a general rule, user agents should not expire data
caches, except on request from the user, or after having been left
unused for an extended period of time.

Implementors are encouraged to expose data caches in a
manner related to HTTP cookies, allowing caches to be expired
together with cookies and other origin-specific data. Data
caches and cookies have similar implications with respect to privacy
(e.g. if the site can identify the user when providing the cache, it
can store data in the cache that can be used for cookie
resurrection).

This method takes one to three parameters - low watermark version and
item callback, and an optional success callback. When
called, this method must immediately return and asynchronously perform the
following steps:

Let additions be a list of entities with their URIs and
let removals be a list of URIs.

This method takes one or two arguments - a uri of the
resource to capture and an
optional list of dynamic methods. When called, this
method must immediately return and asynchronously perform the steps
in the resource
capturing process with the first argument being uri,
second being the current window or worker object,
the third being this CacheTransaction object's associated
cache transaction,
the fourth being dynamic methods.

capture()

This method takes two to four arguments - a uri of the
resource to capture, an
entity body, optionally a content type, and an
optional list of dynamic methods. When called, this
method must immediately return and asynchronously perform the steps
in the resource
capturing process with the first argument being uri,
second being the current window or worker object,
the third being this CacheTransaction object's associated
cache transaction,
the fourth being dynamic methods, the fifth being
entity body, and the sixth being content type.

release()

This method takes one argument - uri of the resource to
release. When called, this method must immediately return and
asynchronously perform the steps in the
resource
releasing process with the first argument being uri,
second being the current window or worker object,
and the third being this CacheTransaction object's associated
cache transaction.

This method takes two arguments - a uri and an item
callback. When called, this method must immediately return and
asynchronously perform the following steps:

If the resource identified by uri does not exist in this
CacheTransaction
object's associated data cache, then
the method must raise
data cache
object, then the method must raise the NOT_FOUND_ERR
exception and terminate these steps.

This attribute, on getting, must return the list of protocol methods,
if any, of the managed
resource associated with this CacheItem object.

headers

This attribute, on getting, must return the headers as a native ordered
dictionary data type from the
managed resource
associated with this CacheItem object. In the JavaScript binding, this must be
Object. Each header must have one property (or dictionary
entry), with those properties enumerating in the order that the headers
were recorded in the
managed resource. Each property must have the name of the
header and its value as recorded in the
managed resource.

readyState

This attribute, on getting, must return the current state of the
managed resource. This
must be the appropriate value from the following list:

The navigator.registerOfflineHandler()
method takes two or three arguments
- a uri for identifying the namespace of resources that are to
be intercepted, an intercept handler function, and an optional
review handler function. This method allows applications to
register themselves as possible handlers for particular URI namespaces.
Upon being invoked, this method should create a new
embedded local server called
server with its interceptor
function set to intercept handler and its
reviewer function set to review
handler and add server to the
cache host.

The user agent must invoke the
InterceptHandler,
with an HttpRequest
object request based on the application's resource request and an
MutableHttpResponse
object response. When the send() method is invoked on response, the
user agent must respond to request with the headers, body, and
status specified in response.

The user agent must invoke the
ReviewHandler,
with an HttpRequest
object request based on the application's resource request and an
HttpResponse
object response based on the host's response to that request.

This attribute, on getting, must return the HTTP method,
in upper-case characters, present on this
HttpRequest object.

target

This attribute, on getting, must return the URI of this
HttpRequest object.

bodyText

This attribute, on getting, must return the entity body of this
HttpRequest
object, if the body has a Content-Type of either
text/* or application/xml.

headers

This attribute, on getting, must return the headers as a native ordered
dictionary data type from this
HttpRequest object.
In the JavaScript binding, this must be
Object. Each header must have one property (or dictionary
entry), with those properties enumerating in the order that the headers
were present in the request. Each property must have the name of the
header and its value as present in the request.

This method takes two arguments - name and value
of a response header. Upon calling, this method must store the header
with name and value in this
MutableHttpResponse
object. If a value is already associated with this header, then
this method must append value to it.

setStatus()

This method takes two arguments - code and message
of the response status. Upon calling, this method must store the status
with code and message in this
MutableHttpResponse
object, replacing any previous values for both.

setResponseText()

This method takes a single arguments - body of the response entity.
Upon calling, this method must store the entity body in this
MutableHttpResponse
object, replacing any previous value.

send()

Upon calling, this method must dispatch this
MutableHttpResponse
object. No further changes must be allowed to it.

This attribute, on getting, must return the status code of this
HttpResponse
object.

statusMessage

This attribute, on getting, must return the status message of this
HttpResponse
object.

bodyText

This attribute, on getting, must return the entity body of this
HttpResponse
object, if the body has a Content-Type of either
text/* or application/xml.

headers

This attribute, on getting, must return the headers as a native ordered
dictionary data type from this
HttpResponse object.
In the JavaScript binding, this must be
Object. Each header must have one property (or dictionary
entry), with those properties enumerating in the order that the headers
were present in the response. Each property must have the name of the
header and its value as present in the response.

4.3.2. Changes to the networking model

When a cache host is associated
with one or more effective
data caches, any and all loads for resources related to that
cache host other than those for
child browsing contexts must go through the
following steps instead of immediately invoking the mechanisms
appropriate to that resource's scheme:

Let request be the resource request for fetching the resource.

If request includes the HTTP header X-Bypass-DataCache
and the value of that header is true, then fetch the resource
normally, and abort these steps.

If request is using the HTTP GET mechanism
and resource is not defined in cache with a
dynamic GET
method, then get the entity for resource from the cache
(instead of fetching it), and abort these steps.

If request is using the HTTP HEAD mechanism
and resource is not defined in cache with a
dynamic HEAD
method, then get the entity headers for resource from the cache
(instead of fetching it), and abort these steps.

If resource is not defined in cache with the
dynamic method that
request is using, then fetch a representation of
resource normally and abort these steps.

5. Security Considerations

Apart from requirements affecting security made throughout
this specification implementations may, at their discretion,
not expose certain headers, such as HttpOnly cookies.

Applications need to verify the identity of their users before
allowing access to private resources they
manage. Typically a host verifies the identity of its users by
checking whether the user possesses valid credentials including a
shared secret, e.g., a password. Applications design their own user
interface to provide a means for users to supply their credentials
for authentication. In Web applications, this is typically performed
using HTML forms and it provides applications with a great deal of
control over the authentication user interface. Also, applications
typically do not verify the user's credentials for every request.
Instead, applications verify a token stored on the client as a result
of authentication. This token is a session identifier often stored in
an HTTP cookie [RFC2965].
This approach is highly scalable since just the session identifier
and not credentials are validated for every HTTP request. Use of
session identifiers gives the data source wide latitude over
terminating the authorization and restricting access to certain
scopes. It also allows users to share authorization but not their
credentials with a variety of less-trustworthy applications.

The token approach also enables off-line authentication without
storing any credentials locally. The token produced by the host is
used by BITSY to authenticate requests served locally and for
capturing resources from the host. If a data cache is to store
private resources, it must be
created with a cookie name. Once such a
data cache is created, the user agent must serve or intercept requests
to its captured resources only if the cookie used to secure the cache
is still present in the current browsing context.

Failing this, the user agent must make the request to the host as
would be the case if the resource were not captured locally. If the
user agent receives an HTTP 401 status from a host while capturing a
resource using a required cookie, then it must automatically destroy
the data cache originating that capture attempt.