Game Server Protocols part 2: Handshake

by Christoffer Lernö

Just to remind everyone, this is the example protocol we’re writing:

Protocol handshake

Login

Receive table lists

Join game table

Participate in play

Leave table

Why a protocol handshake?

There are plenty of servers that roll the login packet and the protocol handshake in one. The disadvantage in combining them is that this makes it harder to update the login packet between protocol versions.

It might feel like over-engineering to build for advanced protocol versioning – until you realise that one of the most important considerations in protocol design is actually in how easily it can evolve.

Of course, this is for a persistent connection. For a stateless server we’d have to include protocol + login + request in each request anyway. However, for stateless servers this is not as big an issue, since the requests tend to be layered anyway – typically by using a protocol like HTTP (or HTTPS). That is an interesting topic in itself but beyond the scope of this series.

Our initial handshake

Let’s assume we won’t need more than 65535 protocol versions (and unless the protocol version is horribly misused, this will be true).

The client will send its protocol version as an unsigned short and with the server returning a byte with the result code. Since this forms the bootstrap part of the protocol, we try to keep it as simple and unlikely to change as possible.

At this point the server will know the client protocol, and may respond with any code that it knows the client can accept.

Supporting multiple protocols

The usefulness is revealed when you deploy a new server with an updated protocol. The server can then easily support earlier protocol version by restricting its replies to what it knows the client understands.

After some development you decide that you want to reject clients when the server is getting full. You add the error messageSERVER_FULL.

You write your server to be compatible with both v.1 and v.2, so when a client with v.2 comes to the full server, they get SERVER_FULL and can show a nice error message. If an early v.1 client shows up, the server fall backs to closing the connection.

Behaviour on server full:

CLIENT v.1 SERVER
[v.1] ->
<connection dropped>

CLIENT v.2 SERVER
[v.2] ->
<- SERVER_FULL

You can go even further, the response to the v.2 version could even use a different serialization format entirely. As long as the initial protocol send is the same, we can allow arbitrary changes to the protocol depending on what version the client claims to be.

For professional grade servers, this is a requirement if you want to be able to upgrade servers in a server cluster without downtime.

Other considerations

We need to handle a couple of errors already – the obvious first one being timeout. The client might for some reason hang and not send its handshake message, or the reply never leaves the server, or someone logs into the server using TELNET. – Whatever the reason we can’t sit and wait.

The server may also – due to some bug or because the client settings logged into somewhere else – not respond with a valid return code.

For all of these errors it’s generally enough just log the problem and drop the connection, but you may eventually want to add additional measures to protect the server from things like accidental DoS attacks from broken clients that opens a lot of connections but never completes the login.

It’s really worth looking at Semantic Versioning (http://semver.org/) for the version number. This implicitly builds in concept of Major, Minor and Micro version changes, such that you always know:
* If the Major version number has changed then it’s not compatible. You might maintain a backwards compatibility layer though, making use of the client version to determine this
* If the Minor version number has changed then there are brand new features of the API but all existing features will work as before.
* If the Micro version number has changed then the only changes are non-breaking.

This means that if the client is version 1.2.3 and the server is version 1.2.9 then everything between the two is guaranteed to work regardless, whereas if the client is version 1.2.3 and the server is 2.0.0 then there are breaking changes – unless the server supports a 1.2.x compatibility layer.