The advantage of virtual connections and game-ids (virtual connections are actually the generalized version of using game-ids) is that we don’t need to bother with multiple logins.

For each method there are advantages and disadvantages. For example, game-ids are simple, but force you to push that id with every packet, and while managing multiple connections adds complexity, it also means less packets to handle unsubscribing or leaving tables.

There are many nuances that depend on the requirements of your particular game, but we don’t really have the time to explore them here. For now we’re just going to pretend that you can only join a single game (even though in practice we usually don’t want that restriction).

Creating our packets

The client packet is simple, it only needs to hold the game id, since we’ve already logged in and the server knows our player id.

The response though, will need to return the entire state of the table in order to render it on the client:

Here the client needs to keep a lot of logic in order to make sense of the incoming data. For example, when the PLAYER_BET is received, the client needs to understand that the current player should no longer be active, and the next active player should be the next player that is still in the game, same with PLAYER_FOLD.

This is extremely error prone. What if the protocol had looked like this instead:

Working like this is especially useful if you have a rich state with many types of updates.

Handling updates

To automate the update packets we can use something similar to the object views discussed in this Gamasutra article. The most straightfoward thing is to keep an object with a data structure identical to the full GAME_STATE sent at connect.

Whenever we update the server model, we make sure we send an update to this object, which then can broadcast the delta changes to everyone viewing the game.

One problem we encounter though, is that we get a lot of different update packet types this way. However, nothing prevents us from generalizing them (this is slightly less pleasant with static packets):

There are obviously more details to this implementation, but the main idea is to defer update delta and notifications to when the packet have processed and send it as a single update. Interestingly, this allows us to surpress the queued notifications and updates if we want to. This can be useful for when errors occur during processing.

When is the the ad-hoc solution better?

Removing the need to duplicate game logic in the client is the greatest advantage to pure update deltas. This makes updates very simple in the client, and this is appropriate to use when it is important and expected that the game model is always in sync with the server.

If you have a fairly rich, but well defined, state – like in poker – updates of a player view makes good sense, but that doesn’t mean it’s universally applicable. Starting out with an ad-hoc solution can often be a good idea while prototyping or starting out.

The player game state

For our poker table we’ll want to know what players are participating, the cards on the table, the pot, active players etc. (It’s likely this object is missing a few fields, but you get the general idea)

The list of tables on a poker server is an example of a very common problem when doing client-server programming. There are various aspects to consider, so let’s first review those.

Refresh or subscription?

There are basically two ways to update lists – either using push (the server send data when it detects updates) or using pull (the client requests updated data). Pull is the only thing that works on stateless servers, but even over persistent connections this method can be useful.

As an example of this, imagine that you want to show a list of all the players on the server:

For long lists, having push updates would make the list jump about quite a bit as people join or leave the server, making it hard to read. Here it’s better that the player makes a request for update using a refresh button or similar.

On the other hand, if we have a table list like this, we likely want it to automatically to update:

This is a trade-off in terms of flexibility. For simple lists that the client is guaranteed to need, strategy (a) is useful as the client does not need to handle the state where login is complete but it still hasn’t got a table list. (c) has maximum flexibility, allowing you to skip subscription if it isn’t necessary. (b) is somewhere in between, having disadvantages and some of the advantages of both.

Handling updates

The initial table list will give you all the relevant tables, but how do we handle updates? Either the update will give you the full list again, or it will just give you the updated tables.

Again, this is a question of how long your lists are. If you typically have a list with around 2-10 tables that are very rarely updated, then you might want to send the entire list again in the update packet. It’s simple and guaranteed not to be wrong.

The more complicated way is to let the update packet look something like this:

This vastly cuts down on update packet sizes, but it requires the server to keep track of the currently viewed data for every player – which can add significantly to the complexity of the solution on both client and server. That said, it’s necessary if the server has large lists.

Long lists

The table list packet is also the first time where we might run into the packet size limit. If we already have the update packets which send the delta rather than the full list, we can use those:

Some suggestions

It’s easy to get the subscription deltas wrong, and trying to track down the bugs using the client to reproduce it is very time-consuming. If nothing else, make sure you have very good unit tests on the code that creates the delta updates.

When the lists get large you might find yourself splitting up subscriptions, so that instead of subscribing to all tables, you subscribe to a subset of them. It’s at this point you’re likely to start needing the “end subscription” packets.

It’s very easy to allow subscriptions to get complicated. Find the simplest solution that works for you, don’t try to be clever.

Summary

In this part we’ve added a few packets for sending the list of poker tables, but the procedure is the same for virtually any list of data that may need to be updated.

In the next entry we look at joining a game and observing the game state.

Login considerations

Since we have a poker server handling money, sending your login credentials in plain view over the wire is DEFINITELY NOT OK. To be complete, this section should really have information on how to secure the connection and send login credentials safely over that channel. There are quite a bit of those schemes (many with their fair share of security issues!). This is a deep and complicated subject in itself and frankly I personally do not have sufficient expertise to give solid recommendations. That said, anything not intended for public viewing should always encrypted and tamper-proofed – including login credentials.

Despite this, we’re going to simply send username/password in the clear for this protocol – but note that this is not something you should to for any server where you actually use real money – we’re doing it here because this is a tutorial on how to build a protocol and not the definite guide to implementing secure server protocols.

This is admittedly a large omission but can be acceptable for many types of games. Plus, first time you write a networking solution you’ll have your hands full anyway. Creating a secure solution is something you can try once you’ve gotten comfortable with building networking protocols.

Packet design

For this protocol we’ll simply treat the login as any sort of packet. It’s possible to bundle the login packet with the handshake, but that would mean that you cannot use the standard packet processing for the login packet. This is certainly feasible, but often ends up requiring additional complexity in packet handling.

In order to separate one packet from another, we need an identifier of some sort. This identifier could be a string or a number, but unless you’re writing a string based protocol, you’ll want to use numbers. An unsigned 8-bit or 16-bit number is sufficient. Eight bits are often enough, but you can use 16-bit to get a bit more freedom in assigning ids.

For our protocol this packet id will be sent first, followed by optional packet data. This is convenient since we can handle packets differently without having to inspect the entire packet data.

The client-to-server packets may look something like this (assuming 16-bit packet ids)

In this case we’re reusing the same ids, since they effectively form two separate channels.

In these simple packet examples, a client request will have a single corresponding server response, but in practice that’s not always true.

Before going into details about how to design the login and other requests we’ll need discuss packet payload serialization, since that has a profound effect on how many packet types you need.

Serialization

There are two main approaches to serialization. The first technique is to hard code the data for each packet. This means that each byte in the packet payload is statically specified. The extreme version of hard coded packets may employ bit-packing (squeezing values for multiple fields into a single byte) to minimize packet size.

The second technique is to use a dynamic format, much like a serialized hash map, e.g.

Key Value
position 4
bet 100
complete true

At first glance, the latter method might seem very wasteful on space, but there are several libraries that offer fairly tight space usage.

Hard coded payloads

The nice thing about hard coded payloads is that data size is straightforward, the technique is easy to understand and typically the deserialized data can be carried around in a struct or object which communicates (and documents) the data much better than pulling a data out of a hash map.

However, there are also downsides to hard coded payloads:

Each packet must have it’s own serialization / deserialization routine.

Messy to change when requirements change.

The more complex the payload data structure, the more effort it takes to add the packet.

Usually requires more packet types compared to a dynamic format.

Many of these drawbacks can be addressed by using code generation (e.g. protocol buffers) and macros, but that is also something which in itself adds complexity.

Hard coded payloads is great if your protocol only has a handful of packets that doesn’t have too complex data structure, but the more flexible the data, and the more packet types there are, the more effort they take.

Dynamic payloads

Outside of game servers, you typically see this JSON-based APIs (or XML APIs without validation).

For games, there are two important advantages:

A single packet may cover a wider range of responses.

Complex data structures are easy to transfer

The effect is that the number of different packet types are typically fewer than the corresponding hard coded payload version, and you don’t really have to worry too much when you send complex data, like tree structures.

The drawback with dynamic payloads is that it needs to provide its own definition. Unlike for hard coded payloads where the deserialization is inferred by looking at the packet type, the dynamic data needs to provide both type and data for each field.

Typically you can expect slightly larger memory usage with dynamic payloads, but it can still be very competitive.

I strongly suggest using dynamic payloads when you start out, as they are faster to develop with and can still be shifted to hard coded data at a later stage.

Data checking

Regardless of whether you use a hard coded approach or dynamic, you will need to check the incoming data. For the dynamic payloads you also need to verify that the expected parameters actually exist.

In any case, never make any assumption on the correctness of data sent from client to server. There are many sources for incorrect data, including but not limited to: client bugs, hacking attempts and server bugs.

Our login packet

I’m going to use MessagePack for our protocol. It’s fairly straightforward to use and there are implementations for many languages.

Since we’re not using any login security, we’re simply going to send our login name and password in the clear (again, remember that this isn’t secure!). Our login payload will look like this in JSON notation: { "user": "Foo", "password": "Bar" }.

With MessagePack that payload becomes:

82 A4 75 73 65 72 A3 46
6F 6F A8 70 61 73 73 77
6F 72 64 A3 42 61 72

The server will respond with { "result": 0 } if the login was successful. Again, with MessagePack that is:

Even in our simple example we see that we quickly find a lot of different error scenarios. Not all of these will need to be treated differently. For example, if the client SHOULD have checked that the username is legal, but still received the “illegal username” error the client might treat it the same as if the username is in use. The player doesn’t really care why the username can’t be used, just that it needs to be changed.

The data in our login packet has been very simple, but theoretically the client could send all sorts of additional information, such as currently used language, version of the operating system etc.

(For privacy reasons, do not send unnecessary information and try to anonymize the data where applicable)

This has been a quick overview. In the next part, we look at requesting the list of tables.

Why a protocol handshake?

There are plenty of servers that roll the login packet and the protocol handshake in one. The disadvantage in combining them is that this makes it harder to update the login packet between protocol versions.

It might feel like over-engineering to build for advanced protocol versioning – until you realise that one of the most important considerations in protocol design is actually in how easily it can evolve.

Of course, this is for a persistent connection. For a stateless server we’d have to include protocol + login + request in each request anyway. However, for stateless servers this is not as big an issue, since the requests tend to be layered anyway – typically by using a protocol like HTTP (or HTTPS). That is an interesting topic in itself but beyond the scope of this series.

Our initial handshake

Let’s assume we won’t need more than 65535 protocol versions (and unless the protocol version is horribly misused, this will be true).

The client will send its protocol version as an unsigned short and with the server returning a byte with the result code. Since this forms the bootstrap part of the protocol, we try to keep it as simple and unlikely to change as possible.

At this point the server will know the client protocol, and may respond with any code that it knows the client can accept.

Supporting multiple protocols

The usefulness is revealed when you deploy a new server with an updated protocol. The server can then easily support earlier protocol version by restricting its replies to what it knows the client understands.

After some development you decide that you want to reject clients when the server is getting full. You add the error messageSERVER_FULL.

You write your server to be compatible with both v.1 and v.2, so when a client with v.2 comes to the full server, they get SERVER_FULL and can show a nice error message. If an early v.1 client shows up, the server fall backs to closing the connection.

Behaviour on server full:

CLIENT v.1 SERVER
[v.1] ->
<connection dropped>

CLIENT v.2 SERVER
[v.2] ->
<- SERVER_FULL

You can go even further, the response to the v.2 version could even use a different serialization format entirely. As long as the initial protocol send is the same, we can allow arbitrary changes to the protocol depending on what version the client claims to be.

For professional grade servers, this is a requirement if you want to be able to upgrade servers in a server cluster without downtime.

Other considerations

We need to handle a couple of errors already – the obvious first one being timeout. The client might for some reason hang and not send its handshake message, or the reply never leaves the server, or someone logs into the server using TELNET. – Whatever the reason we can’t sit and wait.

The server may also – due to some bug or because the client settings logged into somewhere else – not respond with a valid return code.

For all of these errors it’s generally enough just log the problem and drop the connection, but you may eventually want to add additional measures to protect the server from things like accidental DoS attacks from broken clients that opens a lot of connections but never completes the login.

Years ago, when I first started out with network programming, I didn’t expect that setting up a protocol would be such a problem.

It’s interesting that there’s a lot written about socket programming but next to nothing about protocol design, which is a very important aspect of network programming.

Many networking tutorials show how to set up a multi-user chat, which is fairly worthless in showing how a real protocol should be constructed.

There are different aspects of protocol design, but what I want to focus on are the following areas:

How to serialize packets.

How to split the protocol into packet types.

How to synchronize the client and server’s data.

Very nice things to have in your protocol.

Narrowing it down a little

There are a lot of different types of networking in games. In some cases messages might be sent over stateless HTTP/HTTPS in some text based format like XML or JSON, while many times you need a persistent connection to let the server push information to the client when necessary.

If you do very little over the wire – no more than a handful of simple messages – you’re unlikely to run into much trouble regardless of how you design things.

This is not intended as the definite guide to game protocols, but rather as a starting point for people who need a fairly complex protocol but don’t know where to start.

I’m going to limit myself to talking about writing a game protocol for persistent connections over a plain TCP socket, although most of this may be applicable in other situations with relevant modifications.

Building an example protocol

In this series we’ll be developing a protocol for a poker server over TCP. It’s not nearly going to be a complete protocol – it’s just to provide some context to the presented concepts.

For a simple poker game, the flow is roughly the following:

Protocol handshake

Login

Receive table lists

Join game table

Participate in play

Leave table

Since we use TCP, the very first thing we need to do is delimit messages, since TCP is essentially a stream of characters.

Delimiting packets

There are fundamentally two ways to delimit packets with TCP. Either you use a delimiter character (or byte sequence) or use a header to specify the packet size.

If you have a text based protocol, this delimiter would typically be \n or 0, the former which would allow simple use of TELNET to issue commands. Even if your game protocol is binary, you might have an admin console which uses TELNET. Delimiters are easy, just keep filling the input buffer until you reach the delimiter. Just remember to guard against buffer overflows.

An essentially binary protocol might also use a text-based login handshake, then go to binary afterwards. Often this is overkill though – better run the entire protocol with a packet size header.

Packet header

One of the simplest packet designs is to have an 8-bit or 16-bit unsigned int, followed by the serialized data payload. It might be tempting to increase this to 24 or 32 bits, but this is a bad idea, since queuing packets that might be 16 MB (24 bits) or even 4 GB (32 bits) is a good way to allow bugs or a malicious client to exhaust the server memory.

Besides, delivering graphics or similar heavy duty data should always be carefully handled, so the protocol allowing for such large packets in a single send is asking for trouble. Typically the 255 byte payload of unsigned 8-bit messages should be sufficient for lightweight protocols, but if you would find yourself sending over longer lists of players or similar, the max 64kb size of the 16-bit sized headers are better.

Reading a packet now looks like this:

Read 1 (for 8-bit) or 2 (for 16-bit) bytes into a header buffer.

From these two bytes, determine the size of the payload.

Allocate a buffer to hold the packet payload.

Read into this buffer until it is full or the connection is broken.

Send the contents of the buffer for processing.

Go to 1.

Next up

Now that we have our stream parsed into tidy packets, we can start the login process. The next entry is about the initial handshake.

“You make it sound like we should always use UDP”

UDP isn’t a panacea. TCP is fine if latency isn’t critical – so if your game doesn’t need to keep latency consistently below 500 ms then by all means, try TCP first. The problem is with the commonly repeated advice claiming TCP is sufficient for everything short of FPS games. That advice is clearly wrong.

“Why aren’t you mentioning head-of-line blocking?”

Head-of-line blocking is fundamental to any reliable, ordered protocol. I wanted to focus on the fact that TCP always assumes packet loss is due to bandwidth limits, which is why it is so ineffectual for packet loss on wifi or similar unreliable connections.

“Better that people cut their teeth on TCP and finish their projects”

UDP is definitely more difficult do get right, and even if you use a library like ENet it’s important to know each has their own reliability algorithm which may or may not be suitable. – On the other hand, it’s important to clearly state that TCP has very fluctuating latency, and can render a low latency game unplayable. It’s important to know that such issues in many cases may be mitigated by converting to UDP.

When writing networked games, the question of UDP vs TCP will eventually come up.

Typically you will hear people say things like: “Unless you’re doing action games, you can use TCP” or “You can use TCP for your MMO, because look at WoW – it uses TCP!”

Unfortunately, these opinions don’t properly reflect the complexity of the TCP/UDP question.

Background

First off, let me state that my background is mainly TCP programming. I worked for years on a leading poker network’s game servers and we’d typically run 4,000 – 10,000 connections on each server instance during peak (with multiple instances running on a single machine) without any problems. From my point of view, TCP is the safe and well-known alternative.

Despite that, our current project is using UDP, and there is no way we could have it work well with TCP. In fact, it started out with TCP, but when it became obvious that we couldn’t get connection quality we wanted, we switched to UDP.

Despite the up-front ease of use, a good TCP solution isn’t easy to code.

However, the most damning property of TCP is the congestion control. Basically TCP interprets packet loss as a result of limited bandwidth, and throttles packet sends.

On 3G/WiFi on packet loss you want the replacement packet to be sent as soon as possible, but the TCP congestion control actually does the reverse!

There is no way to get around this, this is just the way TCP works on a very fundamental level. This is what can push a ping up to the 1000+ ms range on 3G or WiFi due to loss of a single packet.

Why UDP is “hard”

UDP is both easier and more difficult than TCP.

For example, UDP is packet based – which is something you’ll actually have to roll yourself for TCP. You also use a single socket for communication – unlike TCP which require a socket for each connected client. These things are mostly good stuff.

However, for most situations you actually need some concept of a connection, some rudimentary ordering and often also reliability. Neither of those are offered by UDP “out of the box”, while you get it for free with TCP.

This is while people often recommend TCP. With TCP you can get started and don’t worry too much about those things – not until you start having 500+ simultaneous connections anyway.

So yes, UDP doesn’t offer the whole kit, but as we’ll see, that’s exactly why it’s so great. In a way, TCP is to UDP what something like Hibernate is to writing your queries by hand in SQL.

The flawed case for TCP

People often give the advice to go with TCP on the idea that “TCP is just as fast as UDP” or “successful game X is using it, so it works”, not really understanding why it works in that particular game, and why UDP isn’t about about regular packet delivery speed.

So why does World of Warcraft work with TCP? First of all we need to rephrase that question. The question should be “why does World of Warcraft work despite the occasional 1000+ms delay?”. Because that is the reality of TCP – on dropped packets you’ll get huge lags as TCP first needs to detect the missing packet, then resend the packet all while cutting down throughput.

Reliable UDP will also have a delay, but since it’s a property of whatever protocol you write on top of UDP, it’s possible to reduce delays in many ways – unlike TCP, where it’s rolled into the TCP protocol itself and can’t be changed.

[At this point, some people will start talking about Nagle’s algorithm, which is pretty much the first thing you disable in any TCP implementation where latency is important.]

So why does World of Warcraft (and other games) work with these delays?

It’s simply because they’re able to hide the latency.

In the case of World of Warcraft, there are no player-to-player collisions: such collisions can’t be handled reliably predicted – but player-to-environment can, so the latter works fine with TCP.

Looking at combat in WoW, it’s easy to realize that commands sent to the servers are really something along the lines of attack_entity(entity_id) or cast_spell(entity_id, spell_id) – in other words, targeting is position independent. Furthermore, things like starting the attack motion or spell effect can be allowed to start without first getting confirmation from the server by showing a “fizzle” effect if the server response differs from the client prediction.

Starting an action before confirmation is a typical latency/lag hiding technique.

A few years back I wrote the client for a card game called Five Card Jazz. It was http based – which latency-wise is a lot worse than a plain persistent TCP connection.

We used the simple card draw and flip up animation to hide latency so that delays were only apparent in the case of very poor connections. The method was typical: send the request and start the animation drawing cards from the deck, but wait with the final flip up to reveal the cards until the server response arrived. WoW’s battle effects work in a similar manner.

This means that the choice of TCP vs UDP should basically be: “Can we hide latency or not?”

When TCP doesn’t work

A game running TCP either needs to be able to work well with occasional lags (poker clients typically, do – an occasional one second lag isn’t something people will get annoyed about), or have good latency mitigation techniques.

But what if you’re running a game where you can’t really apply any latency mitigation? Player vs player action games often fall into this category, but it’s not confined to action games.

During typical play, you quickly move your character over a world map initially covered with a fog of war, but which is progressively revealed as you explore.

Due to certain game rules and to prevent cheating, the server can only reveal information about the character’s immediate surroundings. This means that unlike WoW, it’s not possible to fully complete the movement until the server response arrives. What makes this a hard problem, compared to the card reveal of Five Card Jazz, is that we’re allowed a latency of max 500 ms before movement feels sluggish.

When prototyping this, everything worked fine as long as everything was on the same LAN, but as soon as we went to WiFi, the movement would randomly stutter and lag. Writing a few test programs showed the WiFi occasionally dropping packets, and every time that happened, server response time shot up from 100-150 ms to 1000-2000 ms.

No amount of tweaking of TCP settings could get around this issue.

We replaced the TCP code with a custom reliable UDP implementation which cut the penalty of a lost packet down to an additional 50 ms(!) – less than the time of a complete roundtrip. And that was only possible due having complete control of the reliability layer on top of UDP.

Myth: Reliable UDP is TCP implemented poorly

Have you heard this said: “Reliable UDP is just like TCP, so use TCP instead”?

The problem here is that this statement is false. Reliable UDP is unlikely to implement TCP’s particular brand of congestion control. In fact, this is exactly the biggest reason why you use reliable UDP instead of TCP – to get rid of its congestion control.

Another important point is how the “reliable” part of “Reliable UDP” works. There are many possible variants. I really like many of the ideas of the Quake 3 networking code which inspired the War Arcana UDP protocol.

You can also use one of the many UDP libraries that support reliable UDP, although the reliability layer might be more general and as such a bit less optimized than a hand-rolled implementation could be.

The bottom line

So UDP or TCP?

Use HTTP/HTTPS over TCP if you are making occasional, client-initiated stateless queries and an occasional delay is ok.

Use persistent plain TCP sockets if both client and server independently send packets but an occasional delay is ok (e.g. Online Poker, many MMOs).

Use UDP if both client and server may independently send packets and occasional lag is not ok (e.g. Most multiplayer action games, some MMOs)

These are mixable too: Your MMO client might first use HTTP to get the latest updates, then connect to the game servers using UDP.