HTML5 Web Socket in Essence

HTML5 WebSocket defines a bi-directional, full-duplex communication channel operates through a single TCP connection, this article discusses its fantastic performance, the WebSocket protocol principle and its handshake mechanism, and develop a WebSocket application in action (Team Poker).

Table of Content

Introduction

HTML5 WebSocket defines a bi-directional, full-duplex communication channel that operates through a single TCP socket over the Web, it provides efficient, low-latency and low cost connection between web client and server, based on WebSocket, developers can build scalable, real-time web applications in the future. Section below is the official definition of WebSocket copied from IETF WebSocket protocol page:

The WebSocket protocol enables two-way communication between a user agent running untrusted code running in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the Origin-based security model commonly used by Web browsers. The protocol consists of an initial handshake followed by basic message framing, layered over TCP. The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with servers that does not rely on opening multiple HTTP connections (e.g. using XMLHttpRequest or <iframe>s and long polling).

This article is trying to go through WebSocket basic concept, the problems it is going to solve, explain it in essence, watch some experimental Demos, develop a simple WebSocket application in action (Team Poker), and describe current open issues of WebSocket. I sincerely hope it will be systematically, easy to understand, from surface to deep so that eventually readers would not only learn what WebSocket is from high level but also understand it in depth! Any thoughts, suggestions or criticism you may have after reading this article will help me to improve in the future, i would appreciate it if you could leave a comment.

Background

In traditional web applications, in order to achieve some real-time interaction with server, developers had to employ several tricky ways such as Ajax polling, Comet (A.K.A Ajax push, Full Duplex Ajax, HTTP Streaming, etc.), those technologies either periodically fire HTTP requests to server or hold the HTTP connection with server for a long time, which "contain lots of additional, unnecessary header data and introduce latency" and resulted in "an outrageously high price tag". websocket.org explained the problems exhaustively, compared the performance of Ajax polling and WebSocket in detail, built up two simple web pages, one periodically communicated with server using traditional HTTP and the other used HTML5 WebSocket, in the testing each HTTP request/response header is approximate 871 byte, while data length of WebSocket connection is much shorter: 2 bytes after connection established, as the transfer count getting larger, the result will be:

"HTML5 Web Sockets can provide a 500:1 or — depending on the size of the HTTP headers — even a 1000:1 reduction in unnecessary HTTP header traffic and 3:1 reduction in latency". --WebSocket.org

WebSocket In Essence

The motivation of creating WebSocket is to replace polling and long polling(Comet), and endow HTML5 web application the ability of real-time communication. Browser based web application can fire WebSocket connection request through JavaScript API, and then transfer data frames with server over only one TCP connection.

This is achieved by the new protocol - The WebSocket Protocol, which is essentially an independent TCP-based protocol. To establish a WebSocket connection client/browser forms an HTTP request with "Upgrade: WebSocket" header which indicates a protocol upgrade request, and the handshake key(s) will be interpreted by HTTP servers and handshake response will be returned (the detailed handshake mechanism will be described below), afterwards the connection is established (figuratively speaking, the 'sockets' have been plugged in at both client and server ends), both sides can transfer/receive data independently & simultaneously, no more redundant header information, and the connection won't be closed until one side sends close signal, that's why WebSocket is bidirectional and full duplex, in additional, comparing the request/response paradigm of HTTP, WebSocket layers a framing mechanism on top on TCP, each data frame is minimally just 2 bytes.

Now it is time for us to delve deep into this protocol, let's start with WebSocket version draft-hixie-thewebsocketprotocol-76 which is now supported by browsers (Chrome 6+, Firefox 4+, Opera 11) and many WebSocket servers (please refer to Browser/Server Support section below for details). A typical WebSocket request/response example is shown below:

The entire process could be described as: the client raise a "special" HTTP request which request "Upgrade" connecting protocol to "WebSocket", on domain "example.com" with path "/demo", with three "handshake" fields: Sec-WebSocket-Key1, Sec-WebSocket-Key2 and 8 bytes ({^n:ds[4U}) after the fields are random tokens which the WebSocket server uses to construct a 16-byte security hash at the end of its handshake to prove that it has read the client's handshake.

Since WebSocket protocol is NOT finalized and is being improved and standardized by IETF Hypertext Bidirectional (HyBi) Working Group, at the time I wrote this article, the latest WebSocket version is "draft-ietf-hybi-thewebsocketprotocol-09" lasted updated by Ian Fette on June 7, 2011, in which both request/response headers are changed comparing to version 76, the handshake mechanism was changed as well, Sec-WebSocket-Key1 and Sec-WebSocket-Key2 combination is replaced with only one Sec-WebSocket-Key, hence it is incompatible with draft-hixie-thewebsocketprotocol-76 (however, Chrome and Firefox Aurora will support it, Microsoft Interoperability Strategy Team is also experimenting, please refer more details in Browser Support section below).

WebSocket request/response in the latest draft-ietf-hybi-thewebsocketprotocol-09:

The Sec-WebSocket-Key is a base64 encoded randomly generated 16-byte value, in the case above it is "WebSocket rocks!", the server reads the key, concats with a magic GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", to "V2ViU29ja2V0IHJvY2tzIQ==258EAFA5-E914-47DA-95CA-C5AB0DC85B11", then compute its SHA1 hash, get result "540b8681a34307fade550a467c317c297799c79a", finally based64 encodes the hash and append the value to header "Sec-WebSocket-Accept".

I've written C# code below to demonstrate how to compute the Sec-WebSocket-Accept conforming to draft-ietf-hybi-thewebsocketprotocol-09:

Browser/Server Support

WebSocket is not only designed for browser/server communication, client application can also use it. However, I guess browser will still be the major platform for WebSocket protocol taken into account the emerging handsets and the coming cloud. At the time I wrote this article, WebSocket protocol version draft-ietf-hybi-thewebsocketprotocol-76 was supported by Safari 5+/Google chrome 6+ (link), Mozilla Firefox 4+ (disabled by default link) and Opera 11 (disabled by default link), IE 9/10 does not support... but on not supported browsers we can use Flash shim/fallback by adopting web-socket-js.

The awesome Can I uses it is maintaining HTML5 new features support in all popular browsers, screenshot below shows WebSocket support:

Please note the screenshot above is talking about WebSocket version draft-hixie-thewebsocketprotocol-76, it doesn't indicate draft-ietf-hybi-thewebsocketprotocol-09 support, as far what I've observed browser support was summarized below:

The url attribute is the WebSocket server Url, the protocol is usually "ws" (for unencrypted plain text WebSocket) or "wss" (for secure WebSocket), send method sends data to server after connected, close is to send close signal, besides there are four important events: onopen, onmessage, onerror and onclose, I borrowed a nice picture below from nettuts.

onopen: When a socket has opened, i.e. after TCP three-way handshake and WebSocket handshake.

Develop WebSocket In Action - Team Poker Demo

Estimating user story effort by using Planning Poker Cards is well-known and widely used in Agile/Scrum development, Program Manager/Scrum Master prepare user stories beforehand, hold meeting with stake holders and have them play poker to represent one's estimation on each story, the higher the card value is, the harder to implement, on the contrary, the lower the value is, the easier to implement, "0" indicates "no effort" or "has been done", "?" indicates "mission impossible" or "unclear requirement".

Actually there is a website - http://pokerplanning.com does the exact work described above, my co-workers and I used it for several times, however, we found it is getting slower and slower as more team members joining the game or after several rounds of voting, we did experience the worst result: no one can vote anymore because everyone's voting page got stucked. I strongly suspect the major reason for this is its Ajax Polling strategy in order to ensure everyone got real-time voting status. By tracking its network activities I guess I was right.

I believe HTML5 WebSocket will solve the problem! So I developed a simple demo (I named it Team Poker) which currently only has limited and basic functionalities described as below:

User can login to poker room after inputting his/her nickname.

Everyone gets notified when one user players a poker.

Everyone gets notified when new player joins.

Newly joined player can see the current participants and voting status.

All participants can see the game result after admin finished one round.

The login screenshot for requirement #1:

New participant(s), new voted poker(s) status update screenshot for requirement #2, #3 and #4 (please click on the image to enlarge):

All participants can see the final poker game result, story #5, here I added a CSS3 3D rotation effect, people who voted "?" or "0" his poker will gradually bubble up, I hope this could be a close design to help to find out people who has most different idea across the team, screenshot showing below and please take a look at my video at the beginning to get a more vivid viewpoint.

Please note the Team Poker demo is concentrated on demonstrating the power of WebSocket, and there is lack of functionalities like moderator/team member role (currently simply hard-coded "Wayne" as moderator), user stories customization, storing game status on server side, etc. However, I've share all the source code at the beginning of this article, in additional, I've uploaded the source code on github: https://github.com/WayneYe/TeamPoker, wish some people make it better and productive, will you fork it with me? Dear reader:).

Ok, now coding time, since all clients need to get notified about other client's changes (new player joining or new poker played), in additional, new joint player needs to know current status, I defined two communication contracts:

ServerStatus, stores current connected client WebSocket instances, players as well as current voting status, they are stored in three global arrays, [{Players}], [{VoteValue}], broadcast to all clients once receiving new client message.

On server side, one important task is to maintain all active client WebSocket connections so that it can "broadcast" messages to every client, and remove the closed client to avoid sending message to "dead" client. Other than this, the logic is very simple, validate message type sent from client, update players/vote status repository and then broadcast to all client:

After going through the code let's see what happens underneath: screenshot below was snapped while I was developing the Team Poker WebSocket demo, it recorded the entire process of the WebSocket communication, in this picture 192.168.1.2 is the host of TeamPoker page which fires WebSocket request, 192.168.1.6 is the WebSocket server based on nodejs which exposes port 8888 running on ubuntu 11.04.

All packets behind WebSocket connection:

WebSocket request & response headers:

So see the power of WebSocket?

Data transfer is done within one TCP connection lifecycle.

No extra headers after handshake. You might notice that the "length" column represents each packet's size, it is less than 100 bytes by average in my case and it only depend on exact transferred data size.

In Ajax polling or Comet, HTTP requests/responses with header information is impossible to achieve same level performance as WebSocket, both of them created new HTTP (TCP) connections to transfer data, and each connection's size is relatively larger than WebSocket, especially when there are cookies stored in header or long headers such as "User-Agent", "If-Modified-Since", "If-Match", "X-Powered-By", etc.

One thing deserves to be mentioned is the TCP keep alive signals, we should consider close the WebSocket connection as soon as we don't need it any more, otherwise bandwidth will be wasted.

Open Issues

Adam Barth and his co-workers had found a security vulnerability of WebSocket, he pointed out many routers do not recognize HTTP "Upgrade" mechanism, those routers treat WebSocket packet after handshake as subsequent HTTP packets, as a result the attackers can poison the proxy's HTTP cache (you can refer their exhaustive description), they suggest using CONNECT-based handshake, most proxies appear to understand the semantics of CONNECT requests than understand the semantics of the Upgrade mechanism, and after simulating CONNECT-based handshake they found there was no way to poison the proxy's HTTP cache.

Because of the security issue, Firefox 4.0 and Opera 11 disabled WebSocket by default, we can enable it in about:config, please refer more details here and here.

Conclusion

WebSocket is a revolutionary feature in HTML5, it defines a full-duplex communication channel that operates through a single socket over the Web, real-time data transferring was never being so easy and efficient with relatively low bandwidth and server cost comparing to Ajax polling or Comet. Although it is now not standardized and has security issue mentioned in above section, hence at this time is not recommended to use it in enterprise solutions or data sensitive applications, developers should learn it, watch it, The only thing that never changes is change, the WebSocket protocol draft version numbers changes fast, you might have noticed that after reading my article, wish it becomes normative and standardized soon!

Server: 192.168.250.195
Windows 7 Home 32 bits
Node.js: 4.4.5.0
when i run the server on node.js, it shows
"<node> sys is deprecated. Use util instead" . It looks fine since the port 8888 is opened and monitoring.

Then i open the browser and try to login the server, i found the server reject the client connection. The web socket message is like following:

Great article about WebSocket.
How did you do to keep the TCP alive? send some heart-beat single?
I used to write a Websocket based app with Jetty, it works fine, but the client end will lost the connection after some idle time, it really bothers me

Hi, in my post I mentioned that the heartbeat packets will hold the connection, however, keeping the TCP connection alive is architecturally invisible from the JS layer, it is done in lower level. IMHO, in HTMl5 WebSocket, both client JS and server side can request to disconnect at any time, holding the connection alive is isolated.

I'm of the belief that if IE does not implement a technology, it doesn't exist. Not because I'm a Microsoft fan, but because they have historically had so much control of the market. That's why I'm discourages to see that they are "experimenting" with the technology.

What's to experiment? Sockets have been in use for so long and are a well-defined technology. Who cares that it's a browser instantiating the socket?