WebSockets: Writing the Server

We're going to do a little three-part introduction to WebSockets, which
provide a way to communicate back and forth with the web server without
all the overhead of a standard HTTP
connection.

And, in the course of it, we'll be writing a simple chat server and
client.

We're going to be piggybacking on the previous blog entry, on writing
A NodeJS Webserver. In particular,
that webserver will be used for generic webserving on this project, and
will also be what the WebSockets run on.

You can see the browsers looking for the favicon.ico file to
show in as the page icon in the task bar. I didn't include one, so they
404.

The remote IP address is showing up in the logs as "::1".
What's that? Turns out it's the equivalent of 127.0.0.1 for IPv6, also known as
localhost.

Stopping the Server

Just hit ^C (that's CTRL-C for the uninitiated) in
the window running the server. It'll quit in fiery blaze of Viking
glory.

The Server, Overall

We're going to make use of the exact same HTTP server from the zeroth
part of this series. It can be run standalone (as we did then), or
it can be included by another NodeJS script and used more like a library
(like we're going to do now).

There's another script in the beej-websockets-demo directory
called server.js. This is the main one to run. It sets up the
HTTP server, and then starts up the WebSockets server on top of it.

The httpServer is just a reference to the standard HTTP
server code we made before. Except this time we've loaded it as a NodeJS
module with require(). It's not doing anything at first, so we
call .start() to fire it off, and save the returned server
instance.

Then we take that server and pass it in to the
WebSockets server with wsServer.start(). It's not the
best-architected code in the world, but it's good for this tutorial. At
least we got to reuse the HTTP server without modification, right?

What's that #!/usr/bin/env node at
the top of the file? That's the shebang that
tells the shell (likely Bash or
some other POSIX
shell) what command is supposed to run all the following text. In
this case, node should be the one. The upshot is that you can
just run it outright:

Unix shell

$ chmod 755 server.js# make executable
$ ./server.js

And BAM! It's running.

This works great at a Unix command line on any Unix variant,
including OSX.

Windows, not so much. As far as I know, Windows users have to run it
explicitly:

Windows shell

C:\beej-websockets-demo> node server.js

The Server, WebSockets Madness

You'll notice the server.js code, above, pulled in both
httpserver.js (which we wrote earlier) and
wsserver.js. It's high time we talked about the second of
these.

This WebSockets implementation piggybacks on top of an existing HTTP
server. So we first launch and HTTP server, and then we say, "Hey,
WebSockets Server: attach to this HTTP server we already made."

This way they're both running in the same process, which can be
convenient. No reason you couldn't write it standalone and run them in
different processes, but I didn't do it that way because... I'm sorry, I
don't have a compelling reason, but that's what I did.

When the start() function is called, it indirectly calls the
startWSServer() function. This takes the HTTP server as an
argument. We'll make a new WebSocket server (again from the
websocket
package), pass it the HTTP server to attach to, and stick some listeners
on request-connection and new-connection events:

The connection is a two step process. First of all, a
request event comes in; its handler can accept or reject the
connection. If accepted and the connection is set up successfully, a
connect event is soon to follow, possibly delivered by Santa, depending on
the season. He's in the RFC somewhere, isn't he?

Chat Server Logic Overview

Not much to it.

When the server gets a new connection, it adds the connection to its
connection list. When a connection closes, it removes the connection
from its connection list. When data arrives from a connection, it sends
that data to all other connections.

This actually led to an interesting diversion. I wanted to be able to
find the connection in the connection list if the connection was closed
(so I could remove it). I wanted to do this quickly and not have to do a
linear search through the list. So the best way was to store the
connection list as an Object, and use the connections as keys
(properties) to look them up in that object.

But you can't use the connection itself as a property, because
property names are converted to string expressions, and this means every
connection would be "[Object object]". They're all
the same when converted to strings.

So what to do? I needed a way to uniquely identify the connection as
a string so I could refer to it in the connection list. I could just
make up an ID, and then add it to the connection as some property:

But is there a better way? Does the connection object already expose
some uniquely identifying information we can use? Let's look
at the documentation.

There's a remoteAddress property... but is that really
unique? Maybe. Maybe not. The docs don't say.

But that gives me an idea. Turns out every TCP connection on the
entire Internet can be uniquely identified by the quad (local port,
local IP address, remote port, remote IP address). Can we get that
information?

The connection object has a socket property that gives
access to the underlying net.Socket
NodeJS object that is the actual connection.

And look there, that object has properties for localAddress,
localPort, remoteAddress, and remotePort!
Just what we need!

Now since all WebSockets are connecting to this server on a
particular port, we know that the local port and address (that is, the
server port and address) will always be the same for all
connections. Only the remote port and address will differ.

With all this in mind, we can generate a unique key for any partiular
connection and use that to refer to it in the connection list. We'll
just make the key the string concatenation of the address and port:

Handling Incoming Connections

The server has a lot of choices about what it can do when it gets an
incoming connection. For instance, it can look at the remote host and
make sure it wants to allow that host to connect. (Often it will only
want web pages it has served to connect back, not just anyone anywhere.)

Or it can reject a connection because the client wants to use an
unsupported protocol.

But if it likes everything it sees, it can reply to the client, "Yes,
let's chat, and let's use this particular protocol."

Because the server has this opportunity to reject an incoming
connection, setting up the connection is actually a two-event process on
the server: first a request event fires, at which point the
server can accept or reject, and then a connect event fires
which signifies the connection has been established.

(There's also a general error event for when something goes
hideously awry.)

So you can fire up the WebSocket server like this, passing in a
reference to the HTTP server that we should attach to:

That just makes the new server, attaches it to the HTTP server, and
sets up the event handlers. onServerRequest() will be called
when a brand new incoming connection arrives to be accepted (or
rejected), and onServerConnect() will be called when the
connection is accepted and fully established.

You might see we're ignoring the close event. This fires
whenever a WebSocket connection closes, but each individual connection
also fires its own close event, so we'll handle our closing
duties there, instead.

onServerRequest()—New Incoming Connections

Like I mentioned, we're going to accept or reject connections based
on two things: (1) URL of the page the connection is coming from and (2)
if the client supports one of our supported protocols. You can actually
reject based on anything you'd like: time of day, phase of moon, whether
or not your kid is playing hooky, whatever.

But you should reject unknown protocols simply because you won't know
how to speak them.

And you (most probably) should reject connections that were served
from other websites for server load and security reasons.

Firstly, let's see how we whitelist connections. The host will come
to us in the form "hostname" or "hostname:port". So
let's put together a quick function to whitelist every domain you will
be serving this from on your webserver. Since I'm testing on
localhost, which is also a machine called goat, which
is 192.168.1.2 on my LAN, and I'm running the server on port
3490, I whitelist all that. If you have a production machine
that will be serving the files, e.g. example.com, whitelist it
here, too.

There's an argument to be made that you shouldn't ship with
localhost and all your test entires in the whitelist. One could
say that since every computer in the world is localhost, a
spoofer could serve the HTML from their own
computer, and then when it connected to the WebSocket server, it would
pass the whitelist test.

We avoid that in our client code by declaring the WebSocket server to
be on the same host that the HTML came from. But, of course, that client
code can be modified.

But a counter-argument is that the whitelist test is inherently
spoofable, since an attacker can spoof the origin in the WebSockets HTTP
request with custom-written code, anyway. And that preventing this kind
of spoofing isn't what the whitelist test is for.

It's actually there to prevent innocent third-parties falling victim
to cross-site scripting attacks that can allow malicious parties to
access the WebSocket as if they were the victim. This is known as
Cross-Site
WebSocket Hijacking.

You can see we're only looking at requestedProtocol[0] and
not looping through them all. Technically, we should loop through them
all to see if there's a match with our desired protocol
(beej-chat-protocol). But in this case, we wrote the client,
and we know the client is only going to request a single protocol, so we
only are checking the first one.

Secondly, the 403 and 400 codes in the
reject() call are HTTP
client error codes. Use whichever ones are appropriate for your
error handling.

If the handler ends up calling accept(), then the
connect event handler, onServerConnect() will be
called.

onServerConnect()—We Finally Got One!

Once the connection is accept()ed and all set up properly,
we get the connect event, and its handler is called. In our
case, we want to add the new connection to our connection list, and set
up event handlers on the connection for message,
error, and close:

As you can see, we're actually storing another object in the
connection list with a property connection (that holds the
actual connection object). We've done this since later we're going to use
the same object to also hold the username associated with the
connection.

Handling Events on the Connections

There's a new connection object for each incoming connection to the
WebSockets server, and, as you saw, we store them all in the
connectionList.

And each one gets event handlers attached to it for message,
error, and close events.

So let's tackle those connection handlers!

Connection onError() Handler

All this guy does is print out the error that has occurred. It
doesn't close the connection. (If the error is lethal, the connection
will close, and we'll get a close event later.)

If you notice where we're building the response packet, we
do it so that it's compliant with the beej-chat-protocol. (See
discussion in Part 1 and Part 2.)

And how about that broadcast() call? We'll get to it later,
but for now, just understand that it sends the response to all
connections in the connectionList.

Connection onMessage() Handler: Where All the Action
Happens

This is it! This is the handler that gets called for the
message event, when new messages arrive to the server. The
server has to decide what to do with them, and then respond (or
whatever) in an appropriate manner.

In this case, it's a chat server, so it'd better behave like one!

The beej-chat-protocol defines what kinds of messages will
arrive. We have them organized in JSON like so:

JSON

{
"type": [type]
"payload": [payload]
}

The type is a string that tells us what kind of packet this
is. The type of packet then defines what will be contained in the
payload. Defined packet types are:

chat-join for when a user joins

chat-leave for when a user leaves

chat-message for when a user sends a chat message

These can be sent from the client to the server, or server to the
client.

So what happens when we get one on the server?

The first thing to is check the type on the packet because
that's going to dictate how we process the payload.

You can do that with a switch or if-else, but in this case I used
the type as a property name into an object that held the handlers for
each packet type. Do whatever makes the most sense and is clearest, of
course. :)

And then we call that magical stuff from our connection's
message event handler. This handler will parse the JSON
message, then look in the messageHandler object and find the
property for this message type. Then call it.

And yes, you caught me sneaking a call to storeUsername() in
there. This function looks for a username in the payload and then stores
it in the connection list entry for this connection. This is useful
later when a client quits, and we want to broadcast to everyone else
that "Beej left the chat".

This is actually a totally hackish way of tracking the username, but
I wanted to keep the demo simple.

A much more correct way would be to not let the user even join the
chat until they'd logged in, sending a username packet at that time. Or,
if they changed their user name on the fly, and new username packet
would be sent then, too. And the server could broadcast this to all the
other clients, and we'd all know the user's new name.

But right now, the client doesn't send a username packet at all. It
just bundles the username along with chat-join and
chat-message packets as they arrive.

So to fix this, it would mean more complexity in the server, client,
and UI. Which you should totally do as an exercise. :)

chat-join Handler

Let's take look at the chat-join handler in the
messageHandler() code, above. This is what will do the right
thing when a chat-join message comes in from a client.

What we're going to do is simply rebroadcast the same message to all
the connected clients so they can print "So-and-so joined the chat" on
the screen.

We do take a slight liberty and clean up the username by trimming
whitespace off either end before sending it back.

Then we call broadcast() (which we haven't talked about yet)
to send the result to all connected clients:

If you're keenly observant, you'll see that we're not handling
incoming chat-leave messages. But only because the client never
sends them. :) The server is the only one who generates these
when the connection closes.

broadcast()—Sending data to all the clients

Lastly, but not leastly, we have to talk about how we're going to
send these messages out to all the clients.

First a quick primer on how to send data over a connection with the
send() method:

JavaScript

connection.send("Hello, World!");

Pretty self-explanatory, actually, I guess.

So we want to loop through our connection list, and send our data
over to each connection.

Our data should be a JSON string (as agreed upon in
beej-chat-protocol, so we need to convert it from a JS Object
to that with the internal JSON.stringify() call.