This paper
explains the design architecture, implementation, and some of the lessons
learned creating the multiplayer (networking) code for the Age of
Empires 1 & 2 games; and discusses the current and future networking
approaches used by Ensemble Studios in its game engines.

Age
of Empires Multiplayer: Design Goals

When the
multiplayer code for Age of Empires was started in early 1996
there were some very specific goals that had to be met to deliver the
kind of game experience we had in mind.

Sweeping
epic historical battles with a great variety of units

Support
for 8 players in multiplayer

Insure
a smooth simulation over LAN, modem-to-modem, and the Internet

Support
a target platform of: 16MB Pentium 90 with a 28.8 modem

The
communications system had to work with existing (Genie) engine

Target
consistent frame rate of 15fps on the minimum machine config

The Genie
Engine was already running and the game simulation was shaping up into
a compelling experience in single player. The Genie Engine is a 2D single-threaded
(game loop) engine. Sprites are rendered in 256 colors in a tile-based
world. Randomly-generated maps were filled with thousands of objects,
from trees that could be chopped down to leaping gazelles. The rough
breakdown (post optimization) of processing tasks for the engine was:
30% graphic rendering, 30% AI and Pathing, and 30% running the simulation
& maintenance.

At a fairly
early stage, the engine was reasonably stable -- and multiplayer communications
needed to work with the existing code without substantial recoding of
the existing (working) architecture.

To complicate
matters further, the time to complete each simulation step varied greatly:
the rendering time changed if the user was watching units, scrolling,
or sitting over unexplored terrain, and large paths or strategic planning
by the AI made the game turn fluctuate fairly wildly by as much as 200
msec.

A few
quick calculations would show that passing even a small set of data
about the units, and attempting to update it in real time would severely
limit the number of units and objects we could have interacting with
the player. Just passing X and Y coordinates, status, action, facing
and damage would have limited us to 250 moving units in the game at
the most.

We wanted
to devastate a Greek city with catapults, archers, and warriors on one
side while it was being besieged from the sea with triremes. Clearly,
another approach was needed.

Simultaneous
Simulations

Rather
than passing the status of each unit in the game, the expectation was
to run the exact same simulation on each machine, passing each an identical
set of commands that were issued by the users at the same time. The
PCs would basically synchronize their game watches in best war-movie
tradition, allow players to issue commands, and then execute in exactly
the same way at the same time and have identical games.

This tricky
synchronization was difficult to get running initially, but did yield
some surprising benefits in other areas.

Improving
on the Basic Model

At the
easiest conceptual level, achieving a simultaneous simulation seems
fairly straightforward. For some games, using lock-step simulations
and fixed game timings might even be feasible.

Since
the problem of moving hundreds or thousands of objects simultaneously
was taken care of by this approach -- the solution still had to be viable
on the Internet with latency swings of 20 to 1,000 milliseconds, and
handle changes in frame processing time.

Sending
out the player commands, acknowledging all messages, and then processing
them before going on to the next turn was going to be a gameplay nightmare
of stop-start or slow command turnover. A scheme to continue processing
the game while waiting for communications to happen in the background
was needed.

Mark used
a system of tagging commands to be executed two "communications
turns" in the future (Comm. turns were separated in AoE
from actual rendering frames).

So commands
issued during turn 1000 would be scheduled for execution during turn
1002 (see Figure 1). On turn 1001 commands that were issued on turn
0999 would be executed. This allowed messages to be received, acknowledged,
and ready to process while the game was still animating and running
the simulation.

Figure 1. Tagging commands to be executed two "communications
turns" in the future.

Turns
were typically 200 msec in length, with commands being sent out during
the turn. After 200 msec, the turn was cut off and the next turn was
started. At any point during the game, commands were being processed
for one turn, received and stored for the next turn, and sent out for
execution two turns in the future.

"Speed
Control"

Figure 2. Speed Control.

Since
the simulations must always have the exact same input, the game can
really only run as fast as the slowest machine can process the communications,
render the turn, and send out new commands. Speed Control is what we
called the system to change the length of the turn to keep the animation
and gameplay smooth over changing conditions in communications lag and
processing speed.

There
are two factors that make the gameplay feel "laggy": If one
machine's frame rate drops (or is lower than the rest) the other machines
will process their commands, render all of the allocated time, and end
up waiting for the next turn -- even tiny stops are immediately noticeable.
Communications lag -- due to Internet latency and lost data packets
would also stop the game as the players waited around for enough data
to complete the turn.

Each client
calculated a frame rate that it thought could be consistently maintained
by averaging the processing time over a number of frames. Since this
varied over the course of the game with the visible line-of-sight, number
of units, map size and other factors -- it was sent with each "Turn
Done" message.

Each client
would also measure a round trip "ping time" periodically from
it to the other clients. It would also send the longest average ping
time it was seeing to any of the clients with the "Turn Done"
message. (Total of 2 bytes was used for speed control)

Each turn
the designated host would analyze the "done" messages, figure
out a target frame rate and adjustment for Internet latency. The host
would then send out a new frame rate and communications turn length
to be used. Figures 3 through 5 show how the communications turn was
broken up for the different conditions.

Figure 3. A single communication turn.

Figure 4. High Internet latency with normal machine performance.

Figure 5. Poor machine performance with normal latency.

The "communications
turn" which was roughly the round-trip ping time for a message,
was divided up into the number of simulation frames that on average
could be done by the slowest machine in that period.

The communications
turn length was weighted so it would quickly rise to handle Internet
latency changes, and slowly settle back down to the best average speed
that could be consistently maintained. The game would tend to pause
or slow only at the very worst spikes- command latency would go up but
would stay smooth (adjusting only a few milliseconds per turn) as the
game adjusted back down to best possible speed. This gave the smoothest
play experience possible while still adjusting to changing conditions.

Guaranteed
Delivery

At the
network layer UDP was used, with command ordering, drop detection and
resending being handled by each client. Each message used a couple of
bytes to identify the turn that execution was scheduled and the sequence
number for the message. If a message was received for a past turn, it
was discarded, and incoming messages were stored for execution. Because
of the nature of UDP, Mark's assumption for message receipt was that
"When in doubt, assume it dropped." If messages were received
out of order, the receiver immediately sent out re-send requests for
the dropped messages. If an acknowledgement was later than predicted,
the sender would just resend without being asked anticipating the message
had been dropped.

Hidden
Benefits

Because
the game's outcome depended on all of the users executing exactly the
same simulation, it was extremely difficult to hack a client (or client
communication stream) and cheat. Any simulation that ran differently
was tagged as "out of sync" and the game stopped. Cheating
to reveal information locally was still possible, but these few leaks
were relatively easy to secure in subsequent patches and revisions.
Security was a huge win.

Hidden
Problems

At first
take it might seem that getting two pieces of identical code to run
the same should be fairly easy and straightforward -- not so. The Microsoft
product manager, Tim Znamenacek, told Mark early on, "In every
project, there is one stubborn bug that goes all the way to the wire
-- I think out-of-sync is going to be it." He was right. The difficulty
with finding out-of-sync errors is that very subtle differences would
multiply over time. A deer slightly out of alignment when the random
map was created would forage slightly differently -- and minutes later
a villager would path a tiny bit off, or miss with his spear and take
home no meat. So what showed up as a checksum difference as different
food amounts had a cause that was sometimes puzzling to trace back to
the original cause.

As much
as we check-summed the world, the objects, the pathfinding, targeting
and every other system -- it seemed that there was always one more thing
that slipped just under the radar. Giant (50MB) message traces and world
object dumps to sift through made the problem even more difficult. Part
of the difficulty was conceptual -- programmers were not used to having
to write code that used the same number of calls to random within the
simulation (yes, the random numbers were seeded and synchronized as
well).