Although the answers below provide a lot of valuable insight, I just wanted to stress the importance of developing your game/engine to be highly deterministic (en.wikipedia.org/wiki/Deterministic_algorithm), as it is essential to achieve your goal.
–
Ari PatrickNov 30 '10 at 17:55

2

Also note that physics engines are not deterministic (Havok claims it is...) so the solution to just store the inputs and timestamps will produce different results every time if your game uses physics.
–
SamaursaNov 30 '10 at 20:10

5

Most physics engines are deterministic as long as you use a fixed timestep, which you should be doing anyway. I would be very surprised if Havok is not. Non-determinism is fairly hard to come by on computers...
–
user744Nov 30 '10 at 21:00

4

Deterministic means same inputs = same outputs. If you've got floats on one platforms and doubles on another (for example), or willfully disabled your IEEE floating point standard implementation, that means you're not running with the same inputs, not that it's not deterministic.
–
user744Dec 1 '10 at 8:25

3

Is it me, or does this question get a bounty every other week?
–
The Communist DuckMar 1 '11 at 17:45

it records the initial state of the game systems on the first frame, and only the player input during gameplay.

quantize inputs to lower # of bits. Ie. represent floats within various ranges (eg. [0, 1] or [-1 , 1] range within less bits. Quantized inputs have to be obtained during actual game play too.

use a single bit to determine whether an input stream has new data. Since some streams won't change frequently, this exploits temporal coherence in the inputs.

One way to further improve the compression ratio for the majority of cases would be to decouple all your input streams and fully run-length encode them independently. This will be a win over the delta encoding technique if you encode your run in 8-bits and the run itself exceeds 8 frames (very likely unless your game is a real button masher). I've used this technique in a racing game to compress 8 minutes of inputs from 2 players while racing around a track down to just a few hundred bytes.

In terms of making such a system reusable, I've made the replay system deal with generic input streams, but also providing hooks to allow the game specific logic to marshal keyboard/gamepad/mouse input to these streams.

If you want fast rewinding or random seeks, you can save a checkpoint (your full gamestate) every N frames. N should be chosen to minimize the replay file size and also make sure the time that the player has to wait is reasonable while the state is replayed to the chosen point. One way to get around this is to ensure that random seeks can only be made to these exact checkpoint locations. Rewinding is a matter of setting the game state to the checkpoint immediately before the frame in question, then replaying the inputs until you get to the current frame. However, if N is too large, you could get hitching every few frames. One way to smooth these hitches is to asynchronously pre-cache the frames between the previous 2 checkpoints while you're playing back a cached frame from the current checkpoint region.

Besides the "make sure the keystrokes are replayable" solution, which can be surprisingly difficult, you could just record the entire game state on every frame. With a little clever compression you can squeeze it down significantly. This is how Braid handles its time-rewinding code and it works pretty well.

Since you'll need checkpointing anyway for rewinding, you might want to just try implementing it the simple way before complicating things.

+1 With some clever compression you can really bring the down the amount of data that you need to store (for example, don't store the state if it hasn't changed as compared to the last state you stored for the current object). I have already tried this with physics and it works really well. If you don't have physics and don't want rewinding the complete game, I would go with Joe's solution simply because it will produce the smallest possible files in which case if you want rewind as well, you can store just the last n seconds of the game.
–
SamaursaNov 30 '10 at 20:07

@Samaursa - If you use a standard compression libraries (e.g. gzip) then you will get the same (probably better) compression without needing to manually do things like check to see if the state has changed or not.
–
JustinDec 1 '10 at 1:43

2

@Kragen: Not really true. Standard compression libraries are certainly good but often won't be able to take advantage of domain-specific knowledge. If you can help them out a little bit, by putting similar data adjacent and stripping out stuff that really didn't change, you can crunch things down substantially.
–
ZorbaTHutDec 1 '10 at 11:22

1

@ZorbaTHut In theory yes, but in practice is it really worth the effort?
–
JustinDec 1 '10 at 22:28

3

Whether it's worth the effort depends entirely on how much data you have. If you've got an RTS with hundreds or thousands of units, it probably matters. If you need to store the replays in memory like Braid, it probably matters.
–
user744Dec 2 '10 at 16:28

You can view your system as if it were composed of a series of states and functions, where a function f[j] with input x[j] changes the system state s[j] into state s[j+1], like so:

s[j+1] = f[j](s[j], x[j])

A state is the explanation of your entire world. The locations of the player, the location of the enemy, the score, the remaining ammo, etc. Everything you require to draw a frame of your game.

A function is anything that may effect the world. A frame change, a keypress, a network packet.

The input is the data the function takes. A frame change may take the amount of time since the last frame passed, the keypress may include the actual key pressed, as well as whether or not the shift key was pressed.

For the sake of this explanation, I will make the following assumptions:

Assumption 1:

The amount of states for a given run of the game is much larger than the amount of functions. You probably have hundreds of thousands of states, but only a several dozen functions (frame change, keypress, network packet, etc). Of course, the amount of inputs must be equal to the amount of states minus one.

Assumption 2:

The spacial cost (memory, disk) of storing a single state is much greater that that of storing a function and its input.

Assumption 3:

The temporal cost (time) of presenting a state is similar, or just one or two orders of magnitude longer than that of calculating a function over a state.

Depending on the requirements of your replay system, there are several ways to implement a replay system, so we can start with the simplest one. I'll also make a small example using the game of chess, recorded on pieces of paper.

Method 1:

Store s[0]...s[n]. This is very simple, very straightforward. Because of assumption 2, the spacial cost of this is quite high.

For chess, this would be accomplished by drawing the entire board for each move.

Method 2:

If you only need forward replay, you can simply store s[0], and then store f[0]...f[n-1] (remember, this is only the name of id of the function) and x[0]...x[n-1] (what was the input for each of these functions). To replay, you simply start with s[0], and calculate

s[1] = f[0](s[0], x[0])
s[2] = f[1](s[1], x[1])

and so on...

I want to make a small annotation here. Several other commenters said that the game "must be deterministic". Anyone who says that needs to take Computer Science 101 again, because unless your game is meant to be run on quantum computers, ALL COMPUTER PROGRAMS ARE DETERMINISTIC¹. That's what makes computers so awesome.

However, since your program most likely depends on external programs, ranging from libraries to the actual implementation of the CPU, making sure that your functions behave the same between platforms may be quite difficult.

If you use pseudo-random numbers, you can either store the generated numbers as part of your input x, or store the state of the prng function as part of your state s, and its implementation as part of function f.

For chess, this would be accomplished by drawing the initial board (which is known) and then describe each move saying which piece went where. This is how they actually do it, by the way.

Method 3:

Now, you most likely want to be able to seek into your replay. That is, calculate s[n] for an arbitrary n. By using method 2, you need to calculate s[0]...s[n-1] before you can calculate s[n], which, according to assumption 2, may be quite slow.

To implement this, method 3 is a generalization of methods 1 and 2: store f[0]...f[n-1] and x[0]...x[n-1] just like method 2, but also store s[j], for all j % Q == 0 for a given constant Q. In easier terms, this means that you store a bookmark at one out of every Q states. For example, for Q == 100, you store s[0], s[100], s[200]...

In order to calculate s[n] for an arbitrary n, you first load the previously stored s[floor(n/Q)], and then calculate all the functions from floor(n/Q) to n. At most, you will be calculating Q functions. Smaller values of Q are faster to calculate but consume much more space, while larger values of Q consume less space, but take longer to calculate.

Method 3 with Q==1 is the same as method 1, while method 3 with Q==inf is the same as method 2.

For chess, this would be accomplished by drawing every move, as well as one in every 10 boards (for Q==10).

Method 4:

If you want to reverse replay, you can make a small variation of method 3. Suppose Q==100, and you want to calculate s[150] through s[90] in reverse. With the unmodified method 3, you will need to make 50 calculations to get s[150] and then 49 more calculations to get s[149] and so on. But since you already calculated s[149] to get s[150], you can create a cache with s[100]...s[150] when you calculate s[150] for the first time, and then you already s[149] in the cache when you need to display it.

You only need to regenerate the cache each time you need to calculate s[j], for j==(k*Q)-1 for any given k. This time, increasing Q will result in smaller size (just for the cache), but longer times (just for recreating the cache). An optimal value for Q can be calculated if you know the sizes and times required to calculate states and functions.

For chess, this would be accomplished by drawing every move, as well as one in every 10 boards (for Q==10), but also, it would require to draw in a separate piece of paper, the last 10 boards you have calculated.

Method 5:

If states simply consume too much space, or functions consume too much time, you can create a solution that actually implements (not fakes) reverse replaying. To do this, you must create reverse functions for each of the functions you have. However, this requires that each of your functions is an injection. If this is doable, then for f' denoting the inverse of function f, calculating s[j-1] is as simple as

s[j-1] = f'[j-1](s[j], x[j-1])

Note that in here, the function and input are both j-1, not j. This same function and input would be the ones you would have used if you were calculating

s[j] = f[j-1](s[j-1], x[j-1])

Creating the inverse of these functions is the tricky part. However, you usually can't, since some state data is usually lost after each function in a game.

This method, as is, can reverse calculate s[j-1], but only if you have s[j]. This means that you can only watch the replay backwards, starting from the point at which you decided to replay backwards. If you want to replay backwards from an arbitrary point, you must mix this with method 4.

For chess, this cannot be implemented, since with a given board and the previous move, you can know which piece was moved, but not where it moved from.

Method 6:

Finally, if you can't guarantee all your functions are injections, you can make a small trick to do so. Instead of having each function return only a new, state, you can also have it return the data it discarded, like so:

s[j+1], r[j] = f[j](s[j], x[j])

Where r[j] is the discarded data. And then create your inverse functions so they take the discarded data, like so:

s[j] = f'[j](s[j+1], x[j], r[j])

In addition of f[j] and x[j], you must also store r[j] for each function. Once again, if you want to be able to seek, you must store bookmarks, such as with method 4.

For chess, this would be the same as method 2, but unlike method 2, which only says which piece goes where, you also need to store where did each piece came from.

Implementation:

Since this works for all kinds of states, with all kinds of functions, for a specific game, you can make several assumptions, that will make it easier to implement. Actually, if you implement method 6 with the entire game state, not only you will be able to replay the data, but also go back in time and resume playing from any given moment. That would be pretty awesome.

Instead of storing all the game state, you can simply store the bare minimum that you require to draw a given state, and serialize this data every fixed amount of time. Your states will be these serializations, and your input will now be the difference between two serializations. They key for this to work is that the serialization should change little if the world state changes little as well. This difference is completely inversible, so implementing method 5 with bookmarks is very possible.

I've seen this implemented in some major games, mostly for instant replaying of recent data when an event (a frag in fps, or a score in sports games) occurs.

I hope this explanation wasn't too boring.

¹ This doesn't mean some programs act like they were non-deterministic (such as MS Windows ^^). Now seriously, if you can make a non-deterministic program on a deterministic computer, you can be pretty sure you will simultaneously win the Fields medal, Turing award and probably even an Oscar and Grammy for all that's worth.

On "ALL COMPUTER PROGRAMS ARE DETERMINISTIC," you are neglecting to consider programs that rely on threading. While threading is mostly used for loading resources or to separate the render loop, there are exceptions to that, and at that point you may not be able to claim true determinism any more, unless you are properly strict about enforcing determinism. Locking mechanisms alone won't be enough. You wouldn't be able to share ANY mutable data without additional extra work. In many scenarios, a game doesn't need that level of strictness for its own sake, but could for things like replays.
–
krdluzniMar 3 '11 at 15:43

@krdluzni Threading, parallelism and random numbers from true random sources do not make programs non-deterministic. Thread timings, deadlocks, uninitialized memory and even race conditions are just additional inputs your program takes. Your choice to discard these inputs or not even consider them at all (for whatever reason) will not affect the fact that your program will execute exactly the same given the exact same inputs. "non-deterministic" is a very precise Computer Science term, so please avoid using it if you don't know what it means.
–
slcpfmmmMar 4 '11 at 3:04

@oscar (May be somewhat terse, busy, might edit later): Although in some strict, theoretical sense you could claim thread timings etc. as inputs, this is not useful in any practical sense, since they can not generally be observed by the program itself or fully controlled by the developer. Further, a program not being deterministic is significantly different it being non-deterministic (in the state machine sense). I do understand the meaning of the term. I wish they'd chosen something else, rather than overloading a pre-existing term.
–
krdluzniMar 4 '11 at 19:09

@krdluzni My point in designing replay systems with unpredictable elements such as thread timings (if they effect your ability to accurately calculate a replay), is to treat them just like any other input source, just like user input. I don't see anybody complaining a program is "non-deterministic" because it takes completely unpredictable user input. As for the term, it's inaccurate and confusing. I'd rather have them use something like "practically unpredictable" or something like that. And no, it's not impossible, check VMWare's replay debugging.
–
slcpfmmmMar 5 '11 at 5:24

Save the initial state of your random number generators. Then save, timestamped, each input (mouse, keyboard, network, whatever). If you have a networked game you probably already have this all in place.

Re-set the RNGs and play back the input. That's it.

This doesn't solve re-winding, for which there is no general solution, other than playing back from the start as fast as you can. You can improve performance for this by checkpointing the entire game state every X seconds, then you'll only ever need to replay that many, but the entire game state might also be prohibitively expensive to grab.

The particulars of the file format don't matter, but most engines have a way to serialize commands and state already - for networking, saving, or whatever. Just use that.

One thing that other answers have not covered yet are the danger of floats. You can't make a fully deterministic application using floats.

Using floats, you can have a completely deterministic system, but only if:

Using the exactly same binary

Using the exactly same CPU

This is because the internal representation of floats varies from one CPU to another - most dramatically between AMD and intel CPUs. As long as the values are in FPU registers, they're more accurate than they look like from C side, so any intermediate calculations are done in higher precision.

It's quite obvious how this will affect the AMD vs intel bit - let's say one uses 80 bit floats and the other 64, for example - but why the same binary requirement?

As I said, the higher precision is in use as long as the values are in FPU registers. This means that whenever you recompile, your compiler optimization may swap values in and out of FPU registers, resulting in subtly different results.

You may be able to help this by setting _control87()/_controlfp() flags to use the lowest possible precision. However, some libraries may also touch this (at least some version of d3d did).

With GCC you can use -ffloat-store to force the values out of registers and truncate to 32/64 bits of precision, without needing to worry about other libraries messing with your control flags. Obviously, this will negatively impact your speed (but so will any other quantizing).
–
user744Feb 9 '11 at 12:49

I'd vote against deterministic replaying. It's FAR simpler and FAR less error-prone to save the state of every entity every 1/Nth of a second.

Save just what you want to show on playback - if it's just position and heading, fine, if you also want to show stats save that, too, but in general save as little as possible.

Tweak the encoding. Use as few bits as possible for everything. The replay doesn't have to be perfect as long as it looks good enough. Even if you use a float for, say, heading, you can save it in a byte and get 256 possible values (1.4º precision). That may be enough or even too much for your particular problem.

Use delta encoding. Unless your entities teleport (and if they do, treat the case separately), encode positions as the difference between the new position and the old position - for short movements, you can get away with far less bits than you'd need for full positions.

If you want easy rewind, add keyframes (full data, no deltas) every N frames. This way you can get away with lower precision for deltas and other values, rounding errors won't be so problematic if you reset to "true" values periodically.

I'm somewhat surprised that nobody's mentioned this option, but if your game has a multiplayer component, you may have already done a lot of the hard work necessary for this feature. After all, what's multiplayer but an attempt to re-play the movements of someone else at a (slightly) different time on your own computer?

This also gets you the benefits of a smaller file size as a side effect, again assuming you've been working on bandwidth-friendly network code.

In many ways, it combines both the "be extremely deterministic" and "keep a record of everything" options. You'll still need determinism - if your re-play is essentially bots playing the game again in exactly the way you originally played it, whatever actions they take that can have random outcomes need to have the same outcome.

The data format could be as simple as a dump of the network traffic, though I imagine it wouldn't hurt to clean it up a bit (you don't have to worry about lag on a re-play, after all). You could re-play only a portion of the game by using the checkpoint mechanism other people have mentioned - typically a multiplayer game will send out a full state of the game update every so often anyway, so again you may have already done this work.

To get the smallest possible replay file you'll need make sure your game is deterministic. Usually this involves looking at your random number generator and seeing where it is used in the game logic.

You'll most likely need to have a game logic RNG and an everything else RNG for things like GUI, particle effects, sounds. Once you have this done, you need record the initial state of the game logic RNG, then the game commands of all the players every frame.

For many games there is a level of abstraction between the input and the game logic where the input is turned into commands. For example pressing the A button on the controller results in a "jump" digital command being set to true and the game logic reacts to commands without checking the controller directly. By doing this, you'll only need to record the commands which impact the game logic (no need to record the "Pause" command) and most likely this data will be smaller than recording the controller data. You also don't have to worry about recording the state of the control scheme in case the player decided to remap buttons.

Rewinding is a difficult problem using the deterministic method and other than using snapshot of the game state and fast-forwarding to the point in time you want to look at there is not much you can do other than recording the entire game state each frame.

On the otherhand, fast-forwarding is certainly doable. As long as your game logic isn't reliant on your rendering, you can run the game logic as many times as you want prior to rendering a new frame of the game. The speed of fast-forwarding will just be bound by your machine. If you want to skip ahead in large increments, you'll need to use the same snapshot method as you would need for rewinding.

Possibly the single most important part of writing a replay system that relies on determinism is to record a debug stream of data. This debug stream contains a snapshot of as much information as possible each frame (RNG seeds, entity transforms, animations, etc) and be able to test that recorded debug stream against the state of the game during the replays. This will allow you to quickly let you know mismatches at the end of any given frame. This will save countless hours of wanting to pull your hair out from unknown non-deterministic bugs. Something as simple as an uninitialized variable will mess everything up at the 11th hour.

NOTE: If your game involves dynamic streaming of content or you have game-logic on multiple threads or on different cores... good luck.

A replay made on my computer may not work on your computer because the float result is SLIGHTLY different. Its a big deal.

But after that, if you have random numbers is to store the seed value in the replay. Then load all default states and set the random number to that seed. From there you could simply record the current key/mouse state and the length of time its been that way. Then run all events using that as input.

To jump around files (which is much harder) you'll need to dump THE MEMORY. Like, where every unit is, money, length of time passes, all of the game state. Then fast forwarding but replaying everything but skipping rendering, sound etc until you get to the time destination you want. This could happen every minute or 5mins depending how fast it is to forward.

Main points are
- Dealing with random numbers
- Copying input (player(s), and remote player(s))
- Dumping state for jumping around files
and...
- HAVING FLOAT NOT BREAK THINGS (yes, i had to yell)

To enable faster rewinding/fastforwarding or recording only certain time ranges,
key frames are necessary - if recording all the time, every now and then save entire game state.
If recording only a certain time ranges, at the beginning save the initial state.

If you need ideas on how to implement your replay system, search google for how to implement undo/redo in an application. It may be obvious to some, but maybe not to all, that undo/redo is conceptually the same as replaying for games. It is just a special case in which you can rewind, and depending on the application, seek to a specific point in time.

Undo/redo happens in applications that are themselves fundamentally deterministic, event-driven, and state-light (e.g. the state of a word processor document is solely the text and selection, not the entire layout, which can be recomputed).
–
user744Mar 4 '11 at 10:50

Then it's obvious you have never used CAD/CAM applications, circuit design software, motion tracking software or any application with undo/redo more sophisticated than a word processor. I'm not saying the code for undo/redo can be copied for replay on a game, just that it's conceptually the same (save states and replay them later). However, the main data structure is not a queue but a stack.
–
slcpfmmmMar 4 '11 at 16:02