Friday Facts #76 - MP inside out

Today's edition of the Friday Facts has been written by Blue Cube, enjoy!

Hello fellow Factorians!

I'm breaking away from our magnificent testing / team building session
here at our office to bring you more babbling about the development
of your favourite game.

This time there will be less of the regular "fixing bugs, fixing
multiplayer, designing spaceships" theme from the past weeks and the
post will be a little more technical, focusing on the workings of our
magical multiplayer code.

Lock step

As you have probably noticed, since the last major release (0.11.0)
the game can be played over the network.
There has been a lot of discussions on the forums concerning the lock
step architecture, so let's start with that.

In the lock step architecture each of the networked peers is running
the simulation of everything that happens in the world and there
doesn't need to be any central server; when a player makes an action,
only the action is somehow transferred to all other players.

The biggest advantage of lock step is the low amount of data sent over
the network. Because people with keyboards can only generate a few
hundred bytes per second, this approach scales really well for large
maps. You can play the game just the same no matter if it has hundred
objects or million, which makes this method very attractive for
strategy games (AoE, Starcraft and others have used this approach).

And because nothing is perfect, there is obviously a price to pay for
the low traffic. In regular games you don't care that much if enemy's
health is 0.0001% off, or that the rocket exploded tiny bit more to the
left than it should have. Computer's generally don't do things at
random, but if the programmer is not careful enough, unpredictable
events can leak into the game and cause these problems. ... and because
with lock step architecture you never directly see the other player's
game state, there is no way to correct for these small errors and
eventually they might accumulate and cause both players to see a
completely different game. When such errors appear it is what Factorio
players got to know as Desync.

There are obviously many other ways to make a game work over the
network, one of the most used ones being the client server.

Client server

In the simplest form of the client server architecture the game runs
only on the server and clients serve as something like a remote
control, periodically sending a snapshot of the game state to every
client. The main problem here is that for every action there must be a
message sent to the server and back to the client before any results
become visible.

To work around this, most modern FPS games since Duke Nukem 3D use
something called client side prediction.
Client side prediction basically returns the whole game processing to
every client and every time an action is made, the client both sends
the action to the server and applies it manually without knowing what
other players did. When later the server sends a new game state, the
client modifies the local state to smoothly merge it with the received
one. Rinse and repeat.

Implications for us

As I said before, Factorio uses lock step simulation.
This allowed us to make the game playable over the internet with
hundreds of thousands of active entities without resorting to any major
hacks / optimizations. We also decided to make the game completely peer
to peer, which has some interesting consequences.

One of the negative sides is that every player needs to have an open
connection to every other player and send the data.
This becomes a problem when playing over internet and not all of the
players have public IP address (although we also have NAT punching
which allows you to play even in this case and works almost every time).
The biggest issue with pure P2P is when a group of players want to play
over LAN and another group wants to connect to them over NAT.
In these cases Factorio gets confused and completely refuses to connect.

Most of these problems, however, can be limited by partially moving
from the pure P2P later.
For example if two peers cannot connect directly, one of the others can
serve as a proxy for them.

The most fundamental limit of lock step architecture is that the game
speed is limited by the slowest player.
Because to finish a frame input from all other peers needs to be
processed, a peer who can't run the game fast enough will slow the game
down for everyone. In client server the server can just choose to ignore
the slow client, in Factorio ignoring them would cause the game to
break for everyone.

To help with this, in Factorio we implement sort of a buffer time
interval (called "latency" when starting the game). This determines
amount of time that a peer can wait for anyone's messages without
lagging the game. Unfortunately this also causes the game to delay all
local actions by this time.

That is it

So i hope this post did not bore you to death (it was both shorter than
expected and longer than expected at the same time), there might be more
technical posts coming in the future if there is demand for them. Next week you can
look forward to some of Kovarex's or Slpwnd's wisdom.

... and of course, we are still fixing bugs, fixing multiplayer and
designing spaceships, don't worry.