Sunday, 21 September 2014

Last week I discussed the core network structures for games. There is one really important topic that I left out then: relay servers. Relay servers are especially important to understand since I have recently heard them confused with dedicated servers quite often. Today I would like to explain what relay servers are, and what they are not.

A relay server is essentially just a computer that sends and receives packets. It does not really process data and does not do any gameplay logic. All it does is that if player A sends a packet to player B, then instead of sending it directly player A sends it to the relay server. The relay server then sends it to player B. The relay server is essentially just a glorified router.

So why is this useful? Relay servers have two big advantages. The first is that players can practically always connect to them. Security measures in routers are a big problem in internet connections, causing many users to not be able to connect to each other directly. Usually this can be solved in the router settings by setting UPNP or port forwarding, but many users don't know how to do this. Techniques like NAT punch-through help, but still don't solve the problem in a lot of cases.

An important aspect of connectivity issues is that if one of the two computers that try to connect to each other is set up entirely right, then it is almost always possible to connect the two computers, no matter how badly the other computer is set up. This is where relay servers come in: the developer manages those and can thus make sure they are set up optimally. So even if two players cannot connect to each other directly, it is extremely likely that they can both connect to the relay server and send packets to each other through that.

The other big benefit of relay servers is that they can massively reduce packet count, especially in peer to peer situations. As I explained in a previous blogpost, packet count is an important factor in connection quality.

The internet does not allow multicasting, so if you want to send the same message to several other players, then you just need to send it several times. A relay server can work around this. Whenever a player wants to send to all other players, she sends only one packet to the relay server. The relay server then copies the packet and sends it to each client. If several players are all sending to the same player the relay server can also combine their packets into one bigger packet. These features greatly reduce packet count and bandwidth in a peer to peer situation, or for the host in a situation where a player is the host. This way relay servers theoretically make it possible to have a peer to peer game without dedicated servers.

Note that dedicated servers have these exact same benefits. Players can practically always connect to them and players only have to send packets to the dedicated server instead of to all other players. For this reason there is normally no point to having relay servers if you already have dedicated servers.

A big downside to relay servers is that sending all data through the relay server adds a little bit of latency to the connection. This is especially problematic if players from several continents are playing together in one match. You might think intercontinental play should never automatically happen, but you need thousands of simultaneous players to always avoid this. Even then international friends might send each other invites.

Let's say we have a match with four European players and two Australian players. The relay server for this match is in Europe. The connection between the Europeans will likely be a little bit slower but still fine because the relay server is close. The connection between a European player and an Australian player will also not be affected too much, because the data needs to be sent that far anyway. The problem happens between the two Australian players: since everything does through the relay server, their traffic now goes through Europe instead of directly, massively increasing their ping!

For this reason I would never want to use rigid relay servers for a game where latency matters. If players can connect directly and have a fast enough connection to handle the packet count, then it is probably faster to let them communicate directly. This also saves on the cost of running expensive relay servers. I think relay servers are mostly useful as a last resort for players who otherwise cannot connect at all, and for players whose internet is too slow for the number of players they need to send to.

Awesomenauts currently does not have relay servers. Instead it solves the problem of two players not being able to connect directly by sending through one of the other players. This often works fine but it is an imperfect solution: it increases the burden on that player's connection. Also, in extremely rare cases a player cannot connect to anyone in the match.

We are currently putting a lot of effort into improving the connection quality for players in Awesomenauts. Relay servers are a feature we are considering for the future, but right now we think we can gain more by improving matchmaking first. We are writing a completely new matchmaking system that will allow us to match players better based on their connection and location. Better matchmaking also brings many other benefits that are unrelated to connection quality. A big recent improvement is that we managed to halve (!) the average bandwidth and packet count used by Awesomenauts. In the long run we will need to research relay servers further to know how beneficial their trade-of of connection quality versus latency would really be.

To summarize I would like to stress that relay servers are not the same as dedicated servers. Relay servers are a tool for reducing bandwidth and packet count and for improving connection quality, potentially at the cost of latency.

Note: I have edited this post on 27-9-2014 to remove references to the Photon network library. It turned out they had added new features that I was not aware of and that made my analysis of what Photon offers incorrect and irrelevant to this post.

Sunday, 14 September 2014

When starting to develop an online multiplayer game you need to choose how to structure the netcode. Especially important is the question which computer decides on what part of the gameplay. There are roughly four models in common use in games these days. Today I would like to explain which those are and what their benefits and downsides are.

Here are those four basic structures (of course all kinds of hybrids and variants are possible):

Client-server

In the two versions of client-server there is one computer who is alone responsible for the entire game simulation: the server. The clients cannot make real gameplay decisions. This means that if a player presses a button, it goes to the server, the server executes it and then sends back the results to the client.

This adds significant lag to all input, which is of course totally unacceptable and kills the gameplay feel. To make a game playable with this model all kinds of tricks are needed. The best trick I am aware of is described in this must-read article by Valve. The basic idea is this:

When the player presses a button, the client immediately processes it as if it has the authority to do so, starting animations and such. A message is also sent to the server.

The server receives the button press a little bit later, so the server rewinds to the time of the button press, executes it, and then re-simulates to the current time.

The server then sends the current state to the client

The client receives the latest state, but in the meanwhile more time has passed. So the client rewinds to the time at which the server sent the message, corrects its own state with what the authoritative server had decided, and then re-simulates locally to the current time.

In other words: both the client and the server rewind and then re-simulate whenever a packet is received. Implementing rewinding mechanisms is a complex task and very difficult to add to an existing game. As far as I know this is nevertheless the best and most used approach.

The difference between the two client-server architectures is who the server is. Either it is one of the players, or it is a computer that the game's developer/publisher manages. A dedicated server is usually better, but much more complex and expensive as the developer needs to manage a scalable amount of servers. The fiascos at the launches of Diablo III and Sim City showed how difficult this is to do. The more successful the game, the more difficult dedicated servers are to pull off. They are also simply expensive.

Peer to peer

The third architecture is pure peer to peer. Here no single computer is responsible for the entire game simulation. Instead the simulation is spread out over all of the players. The challenge then is how to divide responsibilities over the players. Awesomenauts uses this model and our distribution of the simulation is simple: each player simulates his own characters and bullets. This has a big benefit: player input can always be handled immediately. No rewinding structure are needed and there is never any input lag for the player. This also makes it much easier to add to an existing game.

Peer to peer has some heavy drawbacks though. The biggest one is that lag becomes much more unpredictable. While in a client server architecture only the lagging player suffers from his own lag, in a peer to peer game the other players will also notice if one player has a bad internet connection.

Peer to peer usually introduces complex synchronisation situations when the simulations of two players are not compatible. A good example of this can be found in my previous blogpost on Awesomenauts' infamous sliding bug. Care needs to be taken to recognise and handle such situations. In most game concepts few of these problems will pop up though: in Awesomenauts pushing other players is the only really complex part regarding conflicting synchronisation.

Another major downside of peer to peer is in the amount of network traffic needed. Since all players need to talk to all other players it requires many more network packets. In client-server only the server needs to talk to everyone, so only one player is affected instead of all of them. Even better for packet count is using dedicated servers: the entire burden falls on servers that the game developer provides.

Deterministic peer to peer lockstep

The fourth and final basic structure is deterministic peer to peer lockstep. This model is mostly used for RTS games. This is also a peer to peer model but here we don't need to worry about which player manages which objects. Instead every client simulates everything in the exact same way. The only thing that needs to be sent over the network is each player's actions. The game runs as lots of really short turns: every step the game collects the commands from all players over the network and then simulates the next step. This is not limited to turn-based games: by doing lots of really short steps it can feel like a real-time game.

Deterministic peer to peer has the enormous benefit that you hardly need to send anything. Only player actions need to be sent. If everyone starts the game in the same situation and runs the exact same steps, then the game will remain in synch without ever sending updates over the network. Therefore this model is highly suitable for RTS games, since they have so many units that synchronising everything is often infeasible. An old but still great article on implementing full determinism is this one: 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond.

A downside to this model is that it usually adds quite a lot of lag to controls, since actions cannot be executed until all players know about them. Such input lag can be hidden by playing sounds and visual effects immediately when the user clicks. This way the player won't notice that his units don't react immediately.

Note that deterministic lockstep can also be combined with a client/server connection model where the data always flows through the server instead of directly between all players.

Implementing full determinism is incredibly difficult. If any differences exist between the simulations on the clients, then these differences will grow over time and result in the desynchronisation of the game. Lots of tricks need to be used to achieve determinism. For example, floats cannot be used because of rounding errors: all logic needs to be build on integers. Random number generators can only be used if their seeds are synched and they are used in the exact same way. This might for example go wrong if one player runs on a higher graphics quality and thus has extra particles on his screen. Those particles might also use the random number generator and thus desynch it. A simple solution is to use a separate random generator for non-gameplay objects, but this is easy to forget, breaking the entire game.

Getting determinism right is such a challenge that many games that use it add a mechanism to check the correctness of the simulation. They regularly send a checksum of the entire gamestate over the network. Checksums are small so this uses hardly any bandwidth. If the checksums are not the same then the game has desynched. To fix a desynch we could pause the game, send the entire simulation over the network and then continue from there. In older games you might recognise this problem when you got kicked out of a game because of a "synchronisation error".

There are of course many more subtleties to network architecture than I have explained here. All kinds of hybrids are possible and there are many details that I have not mentioned, like vulnerability to cheaters and host migration. I cannot discuss them all today, but I hope this blogpost has given a good summary of the basics. One important topic that really needs to be explained in combination with the above information is relay servers so I will cover that next week.

Saturday, 6 September 2014

When learning about online multiplayer programming I always read that it is important to keep the bandwidth usage low. There are tons of articles that discuss bandwidth optimisations and limitations. However, while developing Awesomenauts we learned that packet count can in some cases be equally important. Somehow I have rarely seen this mentioned in articles or books, so I figured it was about time to write a blogpost to tell the world: packet count is also important!

When we started development of Awesomenauts I though that packet count was only relevant because of the size of the packet headers. Every UDP packet has a UDP header (8 bytes) and an IP header (at least 20 bytes). This means that no matter how little data you send per packet, it always gets an added 28 bytes. This makes packet count relevent for bandwidth: if you send 200 packets per second, then you are sending 5600 bytes per second only in headers.

I thought this is where the importance of packet count ends. If I can somehow optimise my game to send only 30 bytes per packet, then it is okay to send 200 packets per second because the total bandwidth will still only be 200 * (20+8+30) = 11600 bytes per second, which is fine for a modern game.

It turns out that this is not true on the real internet. During development of Awesomenauts we found out that high packet count by itself is a serious problem. Quite a few internet connections that would happily send as much as 40 packets per second of 1200 bytes each (totalling 48kB/s) become problematic when they need to send 200 packets per second of 50 bytes each (totalling only 10kB/s).

In our experience sending more than 100 packets per second is problematic for some connections, resulting in packet loss, lag spikes and ultimately losing the connection altogether. We recently decreased the average send rate in a full Awesomenauts match from 150 packets per second to 75 and this seems to have decreased the number of connection errors by around 25%. In that same patch we also decreased the average bandwidth by 50%, so I cannot say for sure whether decreasing just the packet count would have had the same effect. However, based on earlier experiments with this I think the packet count decrease was more important than the bandwidth decrease. Our impression is that packet counts above 100 per second are a problem while below 100 packets per second it is not very relevant to optimise further

The internet in general is a weird topic because it behaves so unpredictably on some connections. For example, we have seen that some routers always bunch our packets: they constantly arrive two at a time even though there is 33ms between sending.

Note that there is more to packet headers and overhead than the UDP and IP headers I mentioned above. For example, I recently learned that when on a DSL line an extra DSL header is added. This is all hidden from the game code, but it means that having lots of packets can on some connections also mean using more bandwidth than you realise.

So there you have it: be mindful of your packet counts! Sending lots of small packets is not a good idea and should be avoided if possible. This is especially relevant for peer-to-peer games, since everyone in the game talks to everyone else and packet counts thus rise quickly.