Monday, February 4, 2013

A Game Engine

So, it's been a while since I made a post; Christmas and the like interfered a bit. But I have done quite a lot of work.

For now the shadows and rendering engine are 'good enough', so I've moved on to the meat of the game engine, as it were.

I'd been mulling the engine design over in my head while working on the graphics engine. I wanted, above all, to make it easily extensible and cleanly separated from the rendering engine. After all, Rogue Moon is to be an online client-server game.

Edit: this turned into a post about networking and game engines, not where I initially meant to go. But I'm going with it...One of the primary difficulties of online games (well, other than the inherent difficulties of synchronization, which is a whopper!) is of course latency. It occured to me that Theory of Relativity is a good analogy: simultaneity goes out the window. An explosion occurs on the server at time X. There is some latency L, the time it takes the server to transmit this information to your (the player's) machine. So you see this explosion at time X+L. Meanwhile some other player might have seen it at X+(1/2L) as he is on a fast connection. The poor guy in Australia playing on a US based server might see it at X+(4L).

Now there are basically two ways of dealing with this. The first is to transmit some sort of timestamp with all commands. Basically what happens is everyone gets a 'turn'. The server processes everyone's input and returns results. Then it waits for everyone's next input (you get skipped if you are missing for too long). Rinse and repeat. There's a fine article on Gamasutra about this, or more directly about how Age of Empires handled these sorts of problems: 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond. (In Age of Empires the 'turns' appear to have been 200ms maximum each).

The down side of this, of course, is that you will be held back by the slowest machine. Even if you and Phred have zippy fast powerhouses, your friend Bob the Slow with his Pentium II is simply going to drag the game down (rather than processing the turns quickly, you'd be stuck waiting 200ms each time for Bob to either respond or to be timed out and skipped for the turn).

Or, as Glenn Fiedler put it at the excellent Gaffer on Games (if you are interested in game network programming or physics read his blog!):

The next limitation is that in order to ensure that the game plays out identically on all machines it is necessary to wait until all player’s commands for that turn are received before simulating that turn. This means that each player in the game has latency equal to the most lagged player. RTS games typically hide this by providing audio feedback immediately and/or playing cosmetic animation, but ultimately any truly game affecting action may occur only after this delay has passed.

Still, this model is suited to RTS style games. Historically they've simply had too many units doing too many things to synchronize all of them, so they proceed in lockstep and only synchronize commands.

But, for any kind of game that moves quickly (shooters/racing/newer MMOs) clearly this won't work. And while RTS style games only have a few players at once, newer games can have hundreds of players in a given area easily. I should think that you really don't want to put your gameplay at the mercy of the slowest person among 100+...

So then what?

Early in the days of internet gaming, John Carmack came up against this problem with Doom:

Although it is possible to connect two DOOM machines together across the Internet using a modem link, the resulting game will be slow, ranging from the unplayable (e.g. a 14.4Kbps PPP connection) to the marginally playable (e.g. a 28.8Kbps modem running a Compressed SLIP driver). Since these sorts of connections are of only marginal utility, this document will focus only on direct net connections.

As Glenn Fiedler rather dryly observed:

In other words, before you could turn, move or shoot you had to wait for the inputs from the most lagged modem player. Just imagine the wailing and gnashing of teeth that this would have resulted in for the sort of folks who wrote above that “these sorts of connections are of only marginal utility”

In response to this, Carmack developed Quake. Instead of communicating directly with other players machines, it became a client/server relation. So you were no longer hung waiting on the slowest player's response. What mattered was your connection to the server. Previously each machine had been running the game simulation in lock-step. But now that no longer mattered; the server was authoritative. The clients became clean and simple, just dumb programs for sending commands and rendering what they were told to.

I recall reading somewhere that Carmack liked this model as it was very clean. And in an ideal world, that's how everything would work. Would that it was so simple. Carmack wrote:

Unfortunately, 99% of the world gets on with a slip or ppp connection over a modem, often through a crappy overcrowded ISP. This gives 300+ ms latencies, minimum. Client. User's modem. ISP's modem. Server. ISP's modem. User's modem. Client. God, that sucks.

Ok, I made a bad call. I have a T1 to my house, so I just wasn't familliar with PPP life. I'm addressing it now.

I ran into this problem in the first real game code I worked on (an unfinished and unreleased project; however bits of code from it became the nucleus of the current network engine). It worked really well locally. Over the internet it was terrible; I was rather naive and unprepared for the latency effects.

Ah, latency. I hit a key, but nothing can happen until that command is sent to the server, processed, then sent back. Lovely. I'll note that latency is always present. What is amazing is how well most modern games hide this.

But, back to John Carmack's problems back at the dawn of internet gaming: whenever you pressed a key you had to wait a third of a second on a good day. But as Glenn wrote: What John did next when he released QuakeWorld would change the industry forever.

What Carmack hit on was to return the simulation element to the client, though not in lockstep with the server:

I am now allowing the client to guess at the results of the users movement until the authoritative response from the server comes through. This is a biiiig architectural change. The client now needs to know about solidity of objects, friction, gravity, etc. I am sad to see the elegant client-as-terminal setup go away, but I am practical above idealistic.

Ah, there it was. So, unfortunately to deal with reality the simple client had to give way. Now when you hit a key you saw your character begin to move immediately; the client basically handled a subset of the game and sent your commands off to the server saying 'I hope this works'. The client acts on your character's response to the world immediately (i.e. it won't let you run through walls), and more importantly it predicts other player's behavior.

Imagine a bunch of players running around a Quake/Doom/Call of Duty map. Updates from the server on the other players positions are happening only infrequently, but to you they seem to move quite smoothly. What is happening is that the client is interpolating their position. Basically if they are moving forward, it assumes they will continue to move forward, if stopped stay stopped. If falling, continue falling until they hit the ground.

Of course the simulation of other players will never be exactly right: Bob is moving along in a straight line when a rocket from Phred blows him fifty feet into the air, etc.

What then happens is that the client must accept the new position from the server, when it arrives, and interpolate between the position the client thinks the player is at, and where the server says he is.

You've probably seen this at work in online games, probably when latency was bad. A player or somesuch will be moving around, but will slowly slide to a different position. This is because your program predicted he would be a point A, while the server insists he is at B. Thus your client slooowly slides him from A to B (slowly and smootly so you hopefully won't notice the correction). Of course this is actually happening all the time, but generally at a very small scale such that you don't notice it. And the faster the updates from the server, the smaller each correction is and the less likely it is to draw notice.

As a side note this is often confused with or called 'lag hacking'. Really there is no such thing. What happens in these occurrences is that a player is moving very quickly, probably faster than the norm for the game, and is changing direction often. In particular I remember World of Warcraft players insisting that rogues were 'lag hacking' when they would activate speed boosts and zip around a target, trying to stay behind it. Fast erratic motion is the worst case for client-side prediction: it will almost always get it wrong, and thus present false images to the viewer.

In reality of course the rogue wasn't all over the place, no matter where you saw him. This is because the server is authoritative.

Why?

Glenn says it better than I can:

The difficulty of this approach is not in the prediction, for the prediction works just as normal game code does – evolving the state of the game character forward in time according to the player’s input. The difficulty is in applying the correction back from the server to resolve cases when the client and server disagree about where the player character should be and what it is doing.
Now at this point you might wonder. Hey, if you are running code on the client – why not just make the client authoritative over their player character? The client could run the simulation code for their own character and simply tell the server where they are each time they send a packet. The problem with this is that if each player were able to simply tell the server “here is my current position” it would be trivially easy to hack the client such that a cheater could instantly dodge the RPG about to hit them, or teleport instantly behind you to shoot you in the back.

This has happened in some poorly-coded online games. Allowing the client to be authoritative is tempting: it is simple to code and even better takes quite a lot of processing work off the servers. However, in just about every case I've heard of it simply destroys the game due to hacking. (Note that this doesn't even get into other problems like aimbots or wall-hacking).

Thus, when the server sends a position update to the client, the server always wins and the client must correct. Now, this too could be hacked, you could tell your client to ignore these updates. The problem, though, is that while you can do whatever you want to your machine, your actual position is what the server dictates, that is where entities in the world and other players see you... hacking your own client cannot change that (there are some exceptions but they can be dealt with).

So, what does all this mean for the game engine I'm working on? Well, it turns out I've gone on rather a bit. I will continue in the next post.