The WebSockets Protocoldraft-ferg-hybi-websockets-latest

Abstract

The WebSocket protocol enables a bidirectional stream of
messages between a client and a server. Messages consist of
a sequence of binary frames over TCP. The protocol uses HTTP
for its handshake, upgrading to the bidirectional binary frames
defined in this document.

Status of this Memo

By submitting this Internet-Draft,
each author represents that any applicable patent or other IPR claims of which
he or she is aware have been or will be disclosed,
and any of which he or she becomes aware will be disclosed,
in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current
Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any time.
It is inappropriate to use Internet-Drafts as reference material or to cite
them other than as “work in progress.”

Recently techniques that enable bidirectional communication over HTTP
have become more pervasive. Those techniques reduce the need to poll
continuously the server thanks to the usage of HTTP hanging requests
and multiple connections between the client and the server
[I-D.loreto-http-bidirectional].

The goal of HyBi is to provide an efficient and clean two-way
communication channel between client and server.

The communication channel will:

allow each side to, independently from the other, send
data when it is willing and ready to do it;

rely on a single TCP connection for traffic in both the
directions.

reduce the high overhead produced by HTTP headers
in each request/ response.

The WebSocket protocol begins with a handshake using a
WebSocket HELLO control frame from the client and a
HELLO control frame from the server. After a peer has
sent its hello, it may send messages and control frames
until the final CLOSE control frame.

The client handshake is a HELLO control frame that
looks like the following:

2.
General Requirements

2.1.
Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].

An implementation is not compliant if it fails to satisfy one or more
of the MUST or REQUIRED level requirements for the protocols it
implements. An implementation that satisfies all the MUST or
REQUIRED level and all the SHOULD level requirements for its
protocols is said to be "unconditionally compliant"; one that
satisfies all the MUST level requirements but not all the SHOULD
level requirements for its protocols is said to be "conditionally
compliant."

2.3.
Terminology

connection: A transport layer virtual circuit established
between a client and a server for the purpose of communication.

control-frame: A frame used to control connection
behavior outside of the application data stream.

frame: The basic unit of WebSocket communication,
consisting of a structured sequence of octets matching the
syntax defined in the actual protocol and transmitted on
the established communication channel.

The client sends the HELLO control frame with
"WebSocket/1.0" as the version. The
client MUST send the following HELLO headers:

A "url" header with the resource URL. The URL must
be an absolute URL.

An "origin" header with /origin/ as its value.

A "protocol" header with /protocol/ as
its value.

[Ed: we could add a "nonce" to handle
certain replay attacks.]

The client may send additional HELLO headers.

After sending the HELLO, the client may send
data and control frames even before receiving the
server's response.

The first data from the server MUST be a HELLO control
frame from the server. The client MUST close the connection
if it detects any errors in the HELLO control frame.
The server's HELLO frame must satisfy the following:

The version MUST be "WebSocket/1.0".

The "status" header MUST exist with value "200".

The "origin" header MUST exist with value /origin/.

The "protocol" header MUST exist with value /protocol/.

[Ed: we could add a "digest" here with a digest of a
nonce.]

The client sends and receives messages and control frames
following the stream protocol.

The client may send a CLOSE control frame at any time.
After sending the CLOSE, it MUST NOT send any more data.

When the client receives a CLOSE control frame, it MUST
stop reading from the stream.

/origin/ is the URL of the source page that
that initiates the WebSockets connection for
browser clients, for example an HTML page
that launches the client JavaScript. If the server
does not launch the client in this fashion, /origin/
is unset.

/protocol/ is the application protocol name. If
not specified by the application, use "text".

5.
Stream Protocol

5.1.
Stream syntax

Once the handshake has been established, the message
stream is symmetrical. Each side sends and reads
a sequence of data messages split into frames interleaved with any
control frames interleaved.

A stream is a sequence of binary data messages,
where each data message is a sequence of partial data
frames. Control frames may appear between data messages
to control the connection.

If the application message data is unicode text, as in a
JavaScript browser, the sender MUST encode the text as UTF-8.

Because the sending side may use a fixed-sized buffer,
it may split a message into any number of
non-final data frames followed by the final data frame.
Short messages will fit into a single final-frame.

The syntax of each frame is defined in the next section.

When closing a connection, the client and server MUST send
a CLOSE control frame. No bytes may be sent or read after
the CLOSE control frame.

A receiver MUST close the connection if it detects any errors
while reading, including any illegal frame syntax, too-long
frame lengths, or any unknown control frame. The client
and server MUST NOT attempt to recover from frame errors.

The stream syntax is defined by the
following grammar. The frame section below defines
the grammar of the frames themselves.

5.3.
Stream Close

Because WebSockets needs to distinguish an intentional
close from a dropped connection, the client or server MUST
send a CLOSE control frame at the end of the stream. Either
side may choose to close the connection gracefully at
any time.

When the client or server wishes to close the stream
gracefully, it MUST send a CLOSE control frame. After
sending the CLOSE, no other data may be sent and the TCP
socket must also be closed.

5.4.
Connection Keepalive

Because TCP connection may drop without notification to
either client or server, either by network failure or
by TCP router timeouts, the WebSocket protocol defines
a pair of keepalive control frames. By defining a pair
of control frames, WebSockets avoids circular ping cascades.

Either the client or server may send a PING-REQUEST
control frame to determine if the connection is still alive.
The peer MUST respond with a PING-RESPONSE.

If the PING-REQUEST sending peer does not receive a
response within a reasonable time, it may close the
connection. The client may establish a new connection, but
recovery of the original stream is not defined by WebSockets,
and must be defined by the application or sub-protocol.

6.
Control Frames

Each control frame has an opcode in the range %x00-7F, followed
by any control data for the opcode. [Ed: the code %x7F is reserved
to allow for opcodes beyond 127, where the full opcode
is encoded as the first bytes of the payload. In practice,
this will never be defined.]

Except for the ops defined here (0-5), the codes are reserved
by the specification. Applications MUST NOT define their own
control frames.

6.6.
PING-REQUEST (op=5)

The PING-REQUEST may be sent by either the client or
the server to check if the connection is still valid.
The receiving end MUST respond with a PING-RESPONSE
control frame.

Because the WebSocket connection is long-lived, intermediaries
like home routers might close idle connections without notifying
either end. Clients and servers may use the PING-REQUEST
ping to check the status of the connection.

It is recommended that clients and servers do not send
PING-REQUEST unless specifically configured to do so
by the application.

PING-REQUEST does not have a payload.

The PING-REQUEST control frame is the
following 4 bytes:

%x80.05.00.00

[Ed: the working group has also discussed
asymmetrical heartbeats as an alternative to the ping-style.
For the heartbeat to work, the timeouts would need to
be negotiated in HELLO or HEADER.]

6.7.
PING-RESPONSE (op=6)

The PING-RESPONSE is a response control frame to
the PING-REQUEST. When a peer receives a
PING-REQUEST control frame, it MUST send a
PING-RESPONSE, to let the other end know the connection
is still available.

PING-RESPONSE does not have a payload.

The PING-RESPONSE control frame is 4 bytes as follows:

%x80.06.00.00

[Ed: the working group has also discussed
asymmetrical heartbeats as an alternative to the ping-style.]

8.
Security Considerations

8.1.
HTTP

Many, if not most, of the security issues related to HTTP
are also present in WebSockets, because WebSockets uses HTTP
for its handshake, and because many WebSockets clients and
servers will also be HTTP clients and servers.

8.2.
Browser Scripting attacks

Compromised HTTP sites or improperly designed HTTP applications
can allow arbitrary JavaScript code to execute on a browser.
The hijacked script might attempt to use a HTTP request
for a WebSocket server, or might attempt to use a WebSocket
request for a HTTP server.

The script may also use a WebSocket request for an entirely
different server than the requesting page. The risk can be
minimized by servers checking the "origin"
header, but this may not be sufficient.

Hijacked clients may also attempt to open a WebSocket
connection using a HTTP/XML connection from the browser,
attempting to spoof a valid WebSocket connection. WebSocket
servers should be written to minimize these risks.

Hijacked clients may open a WebSocket connection to a
non-WebSocket HTTP service.

Author's Address

Full Copyright Statement

This document is subject to the rights,
licenses and restrictions contained in BCP 78,
and except as set forth therein,
the authors retain all their rights.

This document and the information contained herein are provided
on an “AS IS” basis and THE CONTRIBUTOR,
THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST
AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.

Intellectual Property

The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology
described in this document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Information on the procedures with respect to
rights in RFC documents can be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available,
or the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line IPR
repository at http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention
any copyrights,
patents or patent applications,
or other
proprietary rights that may cover technology that may be required
to implement this standard.
Please address the information to the IETF at ietf-ipr@ietf.org.