Crouching tiger, hidden dragon

No, it’s not about the movie. It’s been quite sometime now that HTML5 has stumbled, matured and prospered. But I feel that many of its features have gone completely (or to a high extent) unnoticed. Web apps today are built with utmost care, targeted at reducing latency and being aesthetically pleasing. At the same time, if one delves into the technology stack used, one might find a set of robust and powerful engineering paradigms. One such paradigm which has become widespread today is the (unconditional)usage of full-duplex communication using WebSockets.

Traditionally, the web works with the client making an HTTP request to a server and the server responds back with an HTTP response. The point to be noted is that the initiative is always made from the client and then data is exchanged over a half-duplex HTTP connection. This is mostly, acceptable in scenarios wherein the data is not dynamic or doesn’t change frequently. However, if we do find a capricious source, then we need to constantly update the data shown to the user.

HTTP/1.0

The request-response architecture was initially handled via HTTP/1.0. It worked quite smooth when the webpages were simple HTML static pages with a single resource. However, HTTP/1.0 had a very non-optimized way of handling/making requests. HTTP/1.0 made separate connections for every request. When the web scaled with a plethora of sources of data, the one-to-one mapping between connection and request failed to cope with the scale.

HTTP/1.1

Hence, was introduced a revised version of HTTP/1.0 which enabled reusable connections, HTTP Pipelining, keepaliveconnections, chunked encoding transfers etc. So, now the browser could initiate a connection for the HTML content and then could reuse the same connection to fetch the CSS and JavaScript (if properly handled). Hence, axiomatically we can conclude that the latency reduced drastically over generating a TCP connection for each request. Also that HTTP/1.1 introduced the concept of a Host header which basically lets a single server hosted on a specific port to handle requests for several websites/resources. This led to the substantial reduction in hosting costs as now numerous websites could be pushed onto a single server.

What we really need to understand about HTTP is that it follows the request-response paradigm i.e. it’s not full-duplex and that the communication proceeds with the client initiating a request and the server responds. The server doesn’t respond on its own to a client. However, we create the illusion for the same by:

a. Polling – With polling a client repeatedly sends new requests to a server. If the server has no new data, then it send appropriate indication and closes the connection. The client then waits a bit and sends another request after some time (after one second, for example).

b. Long Polling – With long-polling a client sends a request to a server. If the server has no new data, it just holds the connection open and waits until data is available. Once the server has data (message) for the client, it uses the connection and sends it back to the client. Then the connection is closed.

WebSocket Protocol

I am not going to show you how to write code relevant to websocket because I assume we’ve acquired quite a bit of expertise in that by now. Lets try to answer the simple (yet important) questions as to what exactly a WebSocket is and where does it fit in. WebSocket(ws) is a new protocol which allows full-duplex communication. The RFC defines it as:

The WebSocket Protocol enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the origin-based security model commonly used by web browsers.

WebSocket protocol is a single-socket connection, independent TCP-based protocol. Note that this protocol works both in standard unsecured mode (ws) as well as over TLS (wss). It being a single socket connection, reuses the connection between server to client and client to server thereby reducing communication latency.

HTTP and WebSockets

Although WebSocket is an independent protocol, it’s inception is through an HTTP request. There are two parts to the protocol:

a. Handshake – an HTTP request is made to the server with all common headers along with a special header called the Upgrade header.

Note that the Upgrade header could either mean to upgrade the existing protocol version or to switch the protocol. In the example above, the header specifies the server to switch the protocol to websocket. Now, if the server does understand the websocket protocol, then it responds back to the client as:

The 101 status code in the response means that the server has accepted the request of a protocol upgrade. Just for the information of the reader, a similar protocol upgrade occurs when the communication needs to happen with HTTP over TLS. Lets have a walkthrough of the new headers (highlighted in blue) introduced:

a. Sec-WebSocket-Key – header used in the opening handshake process. This header provides information used by the server related to the validity of the websocket request.

b. Sec-Websocket-Protocol – This header is initially sent from the client with a list of sub-protocols and then is sent back by the server with its agreed selection of sub-protocol.

c. Sec-Websocket-Version – this header is sent from the client indicating the version of the protocol used. Note that this header is also sent back to the client by the server if it doesn’t understand the version stated by the client.

d. Sec-Websocket-Accept – this header is sent from server to client indicating that the server is willing to initiate the websocket connection.

The interesting fact is that the Sec-Websocket-Accept header is computed from the Sec-Websocket-Key value. So, if I would code it:

So basically we take the Sec-Websocket-Key header, trim it to remove leading and trailing whitespaces. We then concatenate it with the GUID followed by hashing it using SHA1 and finally base64 encoding it to convert to an ASCII string.

please note that the code snippet above should be considered as a sample Node.js implementation of the algorithm used to compute the Sec-Websocket-Accept header.

b. Data transfer – Once the handshake process is successfully completed, the protocol switches from HTTP to WebSocket(ws) and data can flow in full duplex mode on the same channel. At this point the HTTP connection breaks down and is replaced by the WebSocket connection over the same underlying TCP/IPconnection. The WebSocket connection uses the same ports as HTTP (80) and HTTPS (443), by default. Data is exchanged in WebSocket protocol as messages. Each message contains a set of frames. Each frame has an associated type for e.g. text type, binary type or control frames (frames which do not carry application data but carry protocol-level signaling data instead).

This is indeed a very elegant and powerful way of data communication. But the cardinal properties here are, websocket:

a. creates a full-duplex communication channel

b. has reduced latency as a single connection is used to communicate data

c. conducive to supporting highly interactive applications

The set of example applications that become the perfect fit for these yardsticks are fast multi-player games, instant chat applications (real time communication – RTC) etc. These applications completely rely on unparalleled responsiveness.

However, there is another category of applications which do not demand such high degree of responsiveness e.g. notification push, monitoring services, broadcast services, twitter feed capture (our favorite one) etc. What I’ve observed is, the usage of websocket for implementing these applications. I will agree to disagree on this choice of protocol. In this set of applications, the major challenge is that without polling we want to receive constant updates from the server i.e. just subscribe to the server for updated data. However, we must confess that there isn’t much interaction involved from the client’s perspective. So if by some mechanism, we can ask the server to push new data whenever available, to the client then our purpose would be solved. We do not need to create a full-duplex bidirectional communication channel for these applications. All client interactions can be handled via XMLHTTPRequest.

Server Sent Events (SSE)

SSE, also known as EventSource, is an API which lets an application subscribe to events/updates from a server. This subscription isobviously half-duplex wherein fresh data is sent over from the server to the client without the client polling for it. Unlike websocket which is an independent new protocol, SSE is transmitted over traditional HTTP. There’s no protocol switch or complex request header lists. Let’s take an example.

Client side (JavaScript API):

if(window.EventSource){

var source = new EventSource(“myServer.php”);

source.addEventListener(‘open’, function(evt){

//connection was opened

}, false);

source.addEventListener(‘error’, function(evt){

if(evt.readyState == EventSource.CLOSED){

//connection was opened

}

}, false);

source.addEventListener(‘message’, function(evt){

var data =evt.data;

console.log(data);

}, false);

}else{

//EventSource not supported. Resort to polling/long polling for retrieving data

}

If you have worked with WebSockets, then you may see substantial similarity between their implementation and the one mentioned above. SSE although not being full-duplex comes with a very handy feature. If the connection between the server andclient is closed then the browser automatically initiates a re-connection with the server within a stipulated interval. The server implementation may mandate this reconnection timeout. Before delving into the server side implementation, we need understand the data format of SSE. The data from the server needs to adhere to this specified format.

Content-type header should be set to text/event-stream

The response should contain a “data: “ line followed by the response payload and finally terminating the stream with two “\n” characters. A sample response could look like:

data: hello world\n\n

3. For sending larger data, we just need to repeat the process above, with each line of payload data as a new “data: “ line.

For example:

data: abinash\ndata: mohapatra\n\n

This would be received at the client as abinash\nmohapatra. And then one could assort the response with:

evt.data.split(‘\n’).join(”); //abinash mohapatra

Apart from SSE providing auto reconnect, it also lets the server define a unique ID for each event. This event ID is available in the message event in the evt.lastEventId property. The browser keeps a track of these event IDs so that if during an event stream, if the connection is dropped, then the browser will as usual initiate a reconnect and then let the server know of the same and the last event ID received through the usage of a special HTTP header Last-Event-ID. The server implementation for attaching IDs is as follows:

id: 123456\n

data: abinash\ndata: mohapatra\n\n

Further, the server can also control the reconnection timeout (in milliseconds):

retry: 4000\n

data: abinash\n

The above retry value lets the browser initiate a reconnect after 4 seconds of connection dropout. Just like websockets allow us to have user-defined events, SSE also allows us to do so:

Below is implemented a very simple example of EventSource. Note the Content-type header is set to text/event-stream.Further the W3 spec advises to use the Cache-Control header set to no-cache to bypass any caches for requests for eventresources. User agents should ignore HTTP cache headers in the response, never caching event sources.

<?php

header(‘Content-Type: text/event-stream’);

header(‘Cache-Control: no-cache’);

/** some update occurs and then the following code is triggered */

$eventId = rand(1000, 1000000);

$data = “{ id: “.$eventId.” , message: hello world }”;

echo $data;

echo PHP_EOL;

echo PHP_EOL; //this marks end of stream

?>

SSE and Security

Note that the server just broadcasts the event, so at the client it’s imperative to keep track of the event source from a security perspective. Just like the HTML5 postMessage API, we add a thin layer of control/security by filtering the event source i.e. by its origin. By checking the origin attribute in the evt object, we can block unwanted/unsafe resources to be pushed intoour application.

source.addEventListener(‘message’, function(evt){

if(evt.origin === “myReliableSite.com”){

var data =evt.data;

console.log(data);

}else

console.warn(“event data from unknown origin !”);

}, false);

Conclusion

In my opinion, the use of SSE (the hidden dragon) is completely worth it in cases where there isn’t much user interaction (most of which could be handled via XHR anyway) and should be voraciously consumed. Further, one should put in thought if one really needs to use a connection-oriented transport layer protocol (not to forget the Three-way handshake and Head Of Line blocking in TCP). One could use UDP datagrams as well if the sole purpose of the application is to transmit as many packets as possible rather than ensuring order and reliability. There exist couple of options for communicating over connectionless protocols, e.g., WebRTC provides APIs for sending data (raw data as well as streams) over connection-oriented as well as connectionless protocols and Node.js serves APIs to create servers and receive/transmit UDP datagrams.