Intro to WebSockets

We're going to do a little three-part introduction to WebSockets, which
provide a way to communicate back and forth with the web server without
all the overhead of a standard HTTP
connection.

And, in the course of it, we'll be writing a simple chat server and
client.

We're going to be piggybacking on the previous blog entry, on writing
A NodeJS Webserver. In particular,
that webserver will be used for generic webserving on this project, and
will also be what the WebSockets run on.

WebSockets: Your Friends in Cyberspace

How's that for a catchy 90s title? No? Whatever.

WebSockets are named after the famous Unix sockets API.
With this API, you could do a number of things, one of which was
communicate over the Internet. Because of this, many other network
communication APIs were called "sockets", even though they were only
indirectly related.

How's it different than a regular HTTP request?

As outlined in the webserver blog,
HTTP requests tend to be one-off request-response transmissions, after
which the communication is ended. There's a lot of overhead setting up
and tearing down these connections. (This is mitigated on the TCP layer
with HTTP
Persistent Connections.) And there's also a lot of overhead with HTTP
headers that can be avoided.

And, perhaps most importantly, HTTP is not designed with having the
server provide an unsolicited response. HTTP wants the client to say
something, and then the server say something back. But what if you want
the server to be able to send things on its own? In the past, this has
led to hackish solutions, like using a Flash proxy, or
using long
polling, or abusing the HTTP persistent connection.

So how does WebSockets solve those problems?

You might not believe it, because you're naturally overly-skeptical,
but a WebSocket connection begins life as a regular HTTP connection. But
after a couple milliseconds, it's had enough of that nonsense and
requests a connection upgrade to the WebSockets protocol. This protocol
does away with those pesky HTTP headers, and, of course, keeps the
connection open to avoid teardown overhead.

Then, if the server agrees that all this is a good idea, the
connection is established and the fun can begin!

Once it's set up, the server can send messages out over the socket,
and the client on the browser will get an event letting it know data has
arrived. This is how the server pushes unrequested data.

Likewise, the client can also send messages over the socket, and the
server will handle them in whichever way it sees fit. For this toy
project, we'll be coding the server in NodeJS, and it also gets an event
on an incoming message. (Other servers might handle messages in callbacks
or with other mechanisms, depending on the server and language
involved.)

This isn't a difference, but a sameness: the
WebSocket continues to use the same port as the web server (normally
port 80), which helps it get through firewalls which might not have
other ports open. Very convenient. Almost too convenient... like
they planned it that way...

Yeah, but is it well-supported?

WebSockets, like many other web technologies, is 100% supported on
all browsers and servers that support it completely. Among browsers and
servers that do not support it, penetration remains low.

Thanks, I'll be here all night. Really.

To find out if your browser supports it, try the fantastic website CanIUse. As you can
see, it's pretty well-supported as long as you rightfully don't include
IE 9 or earlier.

If you need support on earlier browsers, then you'll have to use
some kind of library that supports one or more of those hackish
solutions I mentioned earlier. One of the more popular ones is Socket.IO, which will actually use
WebSockets if they're available.

But this blog is about learning a specific tech as opposed to being
as practical as possible, so we'll just stick with WebSockets.

On the server side, there are libraries for NodeJS (we'll be using this websocket
package, itself one of many), Go (if you're
writing a server in Go), and undoubtedly many other languages and
servers.

A Practical Example

Let's set up a client and server, and have the client politely ask if
it can set up a WebSocket with the server.

What gives, AngryServer?

Well, that was anticlimactic. What happened?

It could have been a number of things. Maybe the server just plain
doesn't support WebSockets. Or maybe the server detected that the
request was coming from an unexpected website. Or maybe the client has
asked for an unsupported protocol.

Let's tackle these in turn.

What if the server doesn't support WebSockets?

In that case, when the client asks for an upgrade to WebSockets, the
server merely replies "no", and that's the end of it.

What if the request was from an unexpected site?

This one's a little bit trickier to explain, but in a nutshell,
WebSockets are not constrained by the same-origin
policy. This means someone could write JavaScript, host it on
another server, and then initiate a WebSocket connection to
your server. You might, or might not, want to allow this.

When the connection is being established, the server has an
opportunity to see where the page that holds the connector's JS is
hosted. If it's hosted somewhere the server doesn't like, the connection
can be denied.

Although the origin of the script can be spoofed by non-web-browser
clients, this is still very useful against cross-site
scripting attacks. A malicious user might, for example, inject
script in someone else's blog comments that, when read by an
unsuspecting third party, would cause that third party to hit your
WebSocket server and unsuspectingly do... bad things with it.
(The exact nature of the badness depends on what data your server's
handing out.)

A common case here would be to only allow WebSocket connections from
sites that you own.

What if the request asked for an unsupported protocol?

A protocol droid. This has nothing to do
with anything we're about to discuss.

What is a protocol? It's just an agreement from both sides
about the details of which language they're going to speak. In this
case, we're going to identify the protocol by a name you just make up.
For a chat program, I might call it beej-chat-protocol.

Even though both sides are speaking the WebSockets protocol deep
down, you can specify higher-level protocols for your own use. This is
completely free-form and up to you.

Why would you bother? Why not just code the client and server the
same way and not name the protocol at all?

Well, let's say you wrote a client and server that both spoke the
same chat protocol. Everything's running swimmingly. But then you
discover that a customer wants some features that your chat program
doesn't support.

So you think, "No problem. We'll just update the server and client to
support that." But then it turns out you have another customer
who likes the old chat program and doesn't want it updated.

Time for cleverness. What you do is define two protocols,
beej-chat-protocol and beej-chat-protocol-v2, and
program the server to know both of them. Then the old client can keep
using chat-protocol just as always, and the new client can
start using chat-protocol-v2.

When the connection is establishing, the client tells the servers the
protocols it knows, and the server replies with the protocol that both
are going to use. Or the server can shut down the connection if the
client doesn't request any protocols it knows.

This is a topic unto itself, but basically, for a given protocol, we
need to define the data that is transmitted and received for any action
that can be taken on the part of the client, server, or user.

When the user types "Hello!" into the chat, what data, exactly
is transmitted to the server?

WebSockets can send binary data and text data. Some protocols will
use binary. Some will use text. Binary tends to be more terse and thus
faster to transmit, but really, it's all up to you.

For the chat program, we'll keep it simple. We'll say that all the
data that's transferred will be in the form of a JSON string. (So it will
be transmitted as text, not binary.)

The JSON will have a type property (a string), and a
payload property (an object). The structure of the payload will
depend on the type.

(None of this is written in stone tablets, anywhere. I'm just making
it up. That's how protocols are created.)

Here's a sample transmission from the beej-chat-protocol
that represents a message from a user:

The beej-chat-protocol will also define messages of type
chat-join and chat-leave, but we'll leave those
definitions to your imagination, given the chat-message
description, above.

Once the protocol is defined, it must be implemented on both the
client and server so they're speaking the same language, oui?

Now we're cookin' with gas!

Normal communications

Once the WebSocket is connected, then data can be sent from the
client to the server (or vice-versa), in the format specified in the
protocol.

(Again, the protocol's not written in stone anywhere, so technically
it merely very much should be in the format specified by the
protocol. The computer won't burst into flame if you don't obey the
protocol, but you might cause yourself debugging pain later. And other
developers will scowl when they see your code.)

But at this point, everything's pretty simple. The client can build
up a JSON packet, convert it to a string, and send that to the server.

When it arrives, the server will receive an event with the data
attached. It turns it back into a JS Object from the JSON string, looks
at the type and decides what to do with the payload.

For example, it might receive a packet of type chat-message,
and it knows that it should broadcast that packet out to all connected
clients so they can display the chat message.

At that point, the same thing happens in reverse. The server sends
the message, and the client gets an event saying the message has
arrived. The client looks at the type and decides what to do
with the payload. If it's a chat-message, for example,
it would display the message on the screen.

Closing the Connection

The connection can be closed from either the server or the client
side. It can be explicit (where the server or client deliberately closes
the connection in code), or implicit (when the server crashes, or the
browser tab is closed.)

And there are events on the client and server for the end of a
connection, similar to how there are events for regular data. You can
listen for those, and do the Right Thing when they occur.

Errors

Similar to regular communications and close events, there's an error
event that can be caught and handled. Often, a close event follows on
its heels.

In Conclusion

That's the overall of how these beasts work. In the next episode
we'll write some actual client-side code!