Community:

The upcoming HTML5 specification includes a lot of powerful and exiting features which turn web browsers into a fully capable ich internet application (RIA) client platform. Part 1[3] of this article series presented an overview of the history of the web, and investigated the new HTML5 Server-Sent Events[4] communication standard.

2.2 WebSockets

The upcoming HTML5 standard also includes WebSockets[5]. WebSockets enables establishing a bidirectional communication channel. In contrast to Server-Sent Events, the WebSocket protocol is not build on top of HTTP. However, the WebSocket protocol defines the HTTP handshake behaviour to switch an existing HTTP connection to a lower level WebSocket connection. WebSockets does not try to simulate a server push channel over HTTP. It just defines a framing protocol on top of TCP. In this way WebSockets enables two-way communication natively.

Like the Server-Sent Events specification, WebSockets specifies an API as well as a wire protocol. The WebSockets API specification includes a new HTML element, WebSocket.

A WebSocket will be established by creating a new WebSocket instance. The constructor takes one or two arguments. The WebSocketURL argument specifies the URL to connect. A WebSocketURL starts with the new scheme type ws for a plain WebSocket connection or wss for secured WebSocket connection. Optionally, a second parameter protocol can be set which defines the sub-protocol to be used (over the WebSocket protocol). As with the EventSource element, an onmessage handler can be assigned to a WebSocket, which will be called each time a message is received. Data will be sent by calling the send() method.

If a new WebSocket is created, first the underlying user agent will establish an ordinary HTTP(S) connection to the host of the URL. Based on this new HTTP connection, an HTTP upgrade will be performed. The HTTP specification defines the upgrade header field to do this. The upgrade header is intended to provide a simple mechanism for transition from HTTP protocol to other, incompatible protocols. This capability of the HTTP protocol is used by the WebSocket specification to switch the newly created HTTP connection to a WebSocket connection. By adding the optional WebSocket-Protocol header, a specific sub-protocol is requested.

After receiving the response HTTP header, data will be transmitted according to the WebSocket protocol. This means at this point only WebSocket frames will be transferred over the wire. A frame can be sent at each time in each direction. The WebSocket protocol defines two types of frames: a text frame and a binary frame. Each text frame starts with a 0x00 byte and ends with a 0xFF byte. The text will be transferred UTF8-encoded between the start and the end byte. A text frame requires only 2 additional bytes for packaging purposes. Figure 3 shows a text frame for the string "GetDate" and the string "Sat Mar 13 14:00:25 CET 2010".

Binary data can be transferred by using a binary frame. A binary frame starts with 0x80. In contrast to the text frame the binary frame does not use a terminator. The start byte of a binary frame is followed by the length bytes. The number of length bytes is given by the required bytes to encode the length. Figure 4 shows the binary frame for a small number of bytes to transfer which requires one length byte as well as a larger binary frame which requires two length bytes.

Because JavaScript cannot operate with binary data represented as a byte array, the binary frame type is limited to be used for languages other than JavaScript. In addition the binary frame and text frame, new frames types can be introduced by future releases of the WebSocket protocol specification. WebSocket's framing is designed to support new frame types. A connection can be closed at any time. No extra end-of-connection byte or frame exists.

The overhead involved managing a WebSocket is very minimal. Comet protocols such as Bayeux[7] and BOSH[8], for instance, uses some "hacks" to break the HTTP Request-Response barrier. This forces such protocols to implement a complex session and connection management. Due the fact that WebSockets is not implemented on the top of HTTP it will not run into trouble caused by HTTP protocol limitations.

On the other hand WebSockets, does almost nothing for reliability. It does not include reconnect handling or support guaranteed message delivery like Server-Sent Event does. Further more, as a non-HTTP based protocol, WebSocket cannot make use of the built-in reliability features of HTTP. For instance HTTP supports auto-retry execution strategies in case of network errors. Based on the fact that a GET method can be executed at any time without side effects, a GET method will be re-executed by browsers and modern HttpClients automatically, if a network error occurs. A GET method must not change the server-side resource state by definition and is therefore safe.

This means reliability has to be implemented in the application (sub-protocol level) when using WebSockets. The same is true for the "keep alive" message approach to avoid a proxy server dropping the connection after a small period of inactivity. Additionally, sharing a WebSocket between different pages often causes trouble. In contrast to Server-Sent Events, a WebSocket also includes an upstream channel which is difficult to share. For instance, concurrent writes and reads have to be synchronized, which is not a simple task. This general challenge also affects bidirectional Comet protocols such as Bayeux or BOSH. When using WebSockets, the per-server connection limitation has to be considered carefully.

As with the origin policy used by web browsers to restrict browser-side programming languages from contacting the server, a WebSocket server will only be contacted if the web page is loaded from the same domain. This is not true for stand-alone WebSocket clients such as shown in Listing 6. Such clients contact the WebSocket server in a direct way without same origin policy limitations.

Listing 7 shows an example WebSocket Server implementation. The server handler implements two interfaces: the IHttpRequestHandler handles ordinary HTTP requests and the IWebSocketHandler handles WebSocket connections. In the case of a standard HTTP request (without the upgrade request) the IHttpRequestHandler's onRequest() method will be called. If the client opens a WebSocket, the server will handle the HTTP upgrade and call the IWebSocketHandler's onConnect() method. Each time a WebSocket message is received, the IWebSocketHandler's onMessage() method is called.

Within the onConnect() method some preconditions can be checked. For instance if a required sub-protocol is not supported, the example server will return an error status. Furthermore the origin header will be checked. As is the case with the referer header, the origin header will be set by the browser automatically. The origin header is defined by the HTTP Origin Header RFC, which is in draft. In contrast to the referer header, the origin header includes the domain name of the page's source only. Each time embedded code makes a request, the browser adds the origin header which contains the origin page's domain. Web Servers can block requests that send invalid origin headers.

As shown in Listing 7, the origin header is checked against an internal whitelist to reject unwanted requests. This technique avoids the situation where an attacker copies a java script fragment from a publicly available page and embeds this code fragment into his page. In this case the browser would set the origin header with the domain of the attacker’s page and the upgrade request could be rejected. This technique helps to defend against Cross-Site Request Forgery attacks[9]. The origin header specification is independent of the WebSocket protocol specification. However, the WebSocket protocol defines a WebSocket-Origin header which has to be included in the WebSocket upgrade response.

Due the fact that a WebSocket connection will be established over an HTTP connection, the WebSocket protocol also works with HTTP proxy servers. When using a visible proxy server, the browser always communicates with the proxy server, which forwards the HTTP requests and responses. If the browser is configured to use an HTTP proxy and a WebSocket is opened, first the browser opens a tunnel to the proxy server. By sending an HTTP/1.1 connect request, as shown in Figure 5, the browser asks the HTTP proxy to make a TCP connection to a dedicated (WebSocket) server. Once this connection has been established, the role of the HTTP proxy is “downsized” to act as a simple TCP proxy to the WebSocket server. Using this proxied connection, the browser sends the WebSocket upgrade request to WebSocket server.

Even though a browser does not explicitly configure an HTTP proxy, transparent HTTP proxies can be passed through invisibly by calling the WebSocket server. This depends on the current network infrastructure. Under some circumstances, such transparent HTTP proxies cause trouble for WebSockets. The Connection and Upgrade header are hop-by-hop headers by definition. The HTTP specification says that hop-by-hop headers have to be removed by an intermediary if a request is forwarded to the next hop. In the case of the WebSocket upgrade request, a transparent HTTP proxy will remove the Connection: upgrade header, which will result in the WebSocket server receiving a corrupt WebSocket upgrade request. Today, most HTTP proxies are not familiar with the WebSocket protocol.

Using secured WebSockets can avoid this effect. In creating a secured WebSocket connection, the browser opens an SSL connection to the WebSocket server. In this case intermediaries will not be able to interpret or modify data.

Conclusion

With WebSockets, writing highly interactive real-time web applications becomes a simple task. The WebSocket API is very easy to understand and to use. The underlying WebSocket protocol is high efficient: there is a minimal overhead involved in managing a WebSocket. Due the fact that the WebSocket protocol runs on the top of TCP, the WebSocket protocol does not have to deal with "hacks" as do popular Comet protocols like Bayeux or BOSH. Simulating a bidirectional channel over HTTP leads to complex and less efficient protocols. Especially if only a small amount of data will be transferred, such as tiny notification events, the overhead of the classic Comet protocols is very high. This is not true for WebSockets.

Furthermore, WebSockets fit well into the existing Web infrastructure. For instance WebSockets, use the same ports that standard HTTP connections use. To establish a new WebSocket connection, the WebSocket protocol makes use of the connection management capabilities of the HTTP protocol. WebSockets support highly efficient bi-directional communication by using the existing Web infrastructure without adding new requirements or components.

On the other hand, WebSockets do less for reliability. This has to be done on the application (sub-protocol) level. In contrast to Server-Sent events, the WebSocket protocol does not include reconnect handling or guarantee message delivery. The current WebSocket protocol represents a low-level communication channel only.

In contrast to WebSockets, the Server-Sent Events protocol includes powerful features to reconnect and synchronize messages. High reliability is a built-in feature of Server-Sent Events. Furthermore, as with WebSockets, the overhead involved in managing a Server-Sent Event stream is very low. However, Server-Sent Events support a unidirectional server push channel only. By creating a Server-Sent Event, a server-to-client server-push event stream will be opened. Often Server-Sent Events will satisfy the requirements of a server-push situation, but this depends on the concrete use cases.

What do WebSockets and Server-Sent Events mean for popular Comet protocols such as Bayeux and BOSH? The HTML5 communication standards have the potential to substitute for the classic Comet protocols and become the dominant server-push technology, at least for new applications. On the other side, for instance, the cometd[10] community started implementing cometd 2.0 which will support the WebSockets protocol as a new transport type. cometd is the most popular Bayeux implementation.

Gregor Roth[1] works as a software architect at United Internet group, a leading European Internet Service Provider to which GMX, 1&1, and Web.de belong. His areas of interest include software and system architecture, enterprise architecture management, object-oriented design, distributed computing, and development methodologies.