Last month I talked about the sorts of low-production-value profilers we often
need to build into our games. I described these as "interactive profilers",
distinguishing them from the available commercial products, which I called
"batch profilers".

I described several ways that an interactive profiler can be helpful when a
batch profiler cannot. A big problem with batch profilers is that they average
measurements across the profiling session. So if the program changes its overall
behavior several times during the session, with a batch profiler you see some
kind of average behavior which may not be much like the specific behaviors you
want to debug.

To illustrate this, Table 1 shows an example of three different behaviors a
program might exhibit, and the resulting analysis you'd get from a program like
VTune. During any given frame, the readout of an interactive profiler will
clearly show one of these three behaviors, with a single dominant function
eating most of the CPU time. The result given by the batch profiler shows no
such dominance.

Table 1:

Code Module

Behavior A

Behavior B

Behavior C

Average

rendering

75%

25%

25%

42%

physics_sim

15%

65%

15%

32%

ai_pathfind

10%

10%

60%

27%

Now suppose that behavior C is actually an error, and it should not be happening
at all. Given a clear picture of these numbers, we see that there is a dominant
function in behavior C and we have a hope of understanding why that function is
called. The result of the batch profiler shows three functions of roughly equal
importance, and it's unclear how to proceed in optimizing the system. Of course,
actual modern games are even less clear than this, since they involve longer
lists of profiling numbers (much longer!) and a more varied gamut of behavior.

So like I said last month, you can use an interactive profiler to help clarify
problems like this. But historically, interactive profilers have provided
minimal functionality -- a list of program regions and CPU times, and not much
else. The numbers they present to us are ephemeral: either they exist for a
frame and then disappear, or they're averaged over several frames, hiding spikes
that are important to detect. This is an interesting conundrum in interactive
profiling: we want to add hysteresis to the reported data, so that it is easier
to read (as with the IIR filters last month); but that hysteresis can hide
important events. It makes the profiler less interactive, and more
batch-oriented, but without the stored-data advantages of batch profiling.

So with the common sort of interactive profiler, we lose one of the strengths of
batch profiling: the capability for precise and unhurried analysis of the
available numbers. We can see changing behaviors there in the live profiling
data, but they're difficult to quantify. How much time is being spent in
behavior A, and how much in behavior B? How frequently is some particular
performance spike occurring, and how long does the spike last?

In a sense, we want to quantify the behavior of the profiling numbers at a meta
level. The raw list of numbers for any particular frame tells us how long the
program spent drawing the world, running AI routines, performing collision
detection, and such. As we saw with Table 1, those numbers can change
significantly during a run. So we really want to step up a level, and not limit
ourselves to speaking only about elements of the raw timing data. We want to
look at the array of data from any given frame and treat it as a coherent set of
numbers, as an abstract object in its own right. That coherent set of numbers is
a way to quantify what I have been calling a "behavior". We want to talk about
how consistently the game produces that behavior; and when it produces some
other behavior, what precisely that means. We want solid quantitative
information about when and how often the program exhibits each behavior. Thus we
will have elevated ourselves to a level where we are taking concrete
measurements about formerly abstract ideas like "the fill-rate-limited behavior"
and "the physics-heavy behavior", and no longer trying to make inferences by
looking only at timing data for individual program zones.

If this elevated data is presented in an easy-to-perceive way, it will become an
important tool for understanding just what our program does. (Justification of
the importance of this is coming up in the next section!)

This month I want to explore the kinds of analysis capabilities we might want in
an interactive profiler; next month I will follow-up with some concrete
implementation. Our goal will be to introduce as much analysis as possible
without slowing down the running game. (Maintaining a high frame rate during
profiling is important for reasons discussed last month).

What do we want?

To design appropriate analysis tools, we first need to clearly define what we
want the tools to do for us. This may take some thinking; as games have evolved,
our profiling needs have changed also.

One important observation is that instruction-level profiling is no longer as
important as it once was. One reason for this is that modern CPUs are
unpredictable; it's harder than ever to make a program faster by applying minor
code tweaks. But there's also a deeper reason.

It's a pieceof ancient programmer wisdom that we should not resort to peephole
optimization (what instruction-level profiling is good for) until the options
for algorithmic optimization have beencarefully explored; high-level changes to
an algorithm can functionally invalidate low-level optimizations, while having
an overall impact on the execution time that dwarfs the gain of those low-level
tweaks.

As time goes on, algorithms used in games are getting more complicated; thus,
the number of degrees of freedom for possible algorithmic changes is constantly
increasing. By that I mean, we have more choices of different ways to alter an
algorithm while still accomplishing the same goal. With so many degrees of
freedom, and with limited manpower to spend on optimization, we can easily
exhaust our work budget while exploring algorithmic changes, never reaching the
low-level optimization stage for most of our game's subsystems. So we want a
framework that encourages and supports profiling at an algorithmic level, not an
instruction level.

Also, traditional profilers assume that a program's behavior is somewhat
orderly, and that a static listing of CPU times is sufficient to understand its
behavior. But now games are so complicated that we often don't have a good idea
of what they're doing internally, even when they seem to work. One example: it's
easy to accidentally draw all the entities in the world twice per frame. This
might happen for simple reasons, like big blocks of code got moved around but a
function call got duplicated. In this case the entity rendering routines get
called twice, but it's hard to see because the calls are in different files, a
few function calls' depth apart from each other; if multipass rendering is being
used, there will already be multiple calls to some entity drawin routines that
you really do want to be there, further confusing the situation. A more
sophisticated way you might duplicately render entities is if you collect them
by portal navigation through the world, but the ones you have already collected
don't get correctly marked, so odd combinations of portals cause entities to be
added to the render list multiple times.

Another example: it's easy to perform collision detection against an excessive
number of not-so-nearby objects. Either bounding volumes for the objects are
unintentionally large, or the data structure they're being stored in is overly
loose. In both this and the duplicate-drawing case, the rendered scene appears
to be correct, but the game has taken a lot of extra time to generate it.

So I will say that one of the main purposes of a modern profiler should be to
act as a tool to understand the behavior of a program; to verify that the
program is running correctly or to understand how it is wrong. Even when
performance problems are not due to bugs, they often exist because we don't
quite understand the performance ramifications of our algorithms when given
complex input data. So again, the chief job of the profiler is to help provide
understanding.

Effective utilization of such profiling data requires an approach that is a
little less direct than what we had in the past. With instruction-level
optimization, the profiler can point you at exactly the piece of code you should
be concerned with. But when optimizing algorithmically, the code where the time
is being spent is not usually what you care about; you want to know _why_ time
is being spent there, so that you can change the program's overall behavior, to
call that code less or eliminate it entirely.

An example of this is the rendering of stencil shadow volumes. Suppose the
profiler tells us we're spending a lot of time computing the silhouettes of the
objects in our game world, so that the shadow planes can be extruded from them.
Once we are convinced that the silhouette computation is not grossly
incompetent, we look for algorithmic solutions to the problem, which will likely
yield much greater speed-ups than trying to optimize the silhouette computation
itself. One such speed-up is to more agressively cull light sources that only
create shadow volumes that cannot be seen. With fewer light-to-object
interactions, we call the silhouette routine less often, and the game goes
faster.

While understanding the behavior of our program is a very important goal, right
now we have no tools for that purpose. That seems like a bad thing.

How we might use such a profiler

So what specifically do we want the tools to do for us when we sit
down in front of the computer to profile?

Firstly, I'd like to be able to casually run the game, without needing to
prepare a special build, and without a large framerate hit, and get profiling
data -- this allows me to just walk around in the game world and do things that
test performance characteristics.

I'd like the profiler to detect the various ways that the program behaves, and
to provide for me an abstract list of those behaviors with quantitative results
attached. After I run around the game world doing a bunch of stuff, I want the
profiler to give me something like Table 1, and say "you were in behavior A 50%
of the time, B 35% of the time, and C 15% of the time." Each of A, B and C
consists of a separate set of profiling data, like a batch profiler would give
us for an entire run. Now we are viewing things from a higher level, and would
like to know how important each behavior is, and how they interact. It would be
nice to see a graph of which behavior was happening when, so I can correlate the
changes with things that happened in-game.

Once a well-defined set of behaviors has been measured, I'd like the facility to
mark some of them as expected/acceptable behaviors, and some of them as
anomalous/unacceptable behaviors. The profiler can provide statistics about how
much time has been spent in unacceptable behaviors. Suppose the game has a
journaling system; I am running a recorded session several times to benchmark
performance as I attempt several optimizations, to see how effective those
optimizations are. I can watch the percentage of time being spent in
unacceptable behaviors, and the CPU measurements for those behaviors in
isolation; this may provide much more concrete and useful information than a
batch-profile-style average.

Again if the game has a journaling system, I'd like to mark certain behaviors as
things I want to debug. Then I can start playing back the journaled game events,
and when the game reaches one of the target behaviors, the playback pauses. This
allows me to quickly access the problematic game states, and then look at the
game world to infer what may be causing the problem. In the stencil shadow
example, perhaps during a behavior when the silhouette-finding takes
unreasonable amounts of the CPU, there don't appear to be an unusual number of
light sources or objects in the scene. I then turn on wireframe rendering, and
see that far in the distance, in areas that are occluded from view, are many
light sources and objects. I can then deduce that there is an error in the
occlusion culling code. You could also find this kind of problem without a
journaling system or analytical set of behaviors, by walking around the world
and watching the live numbers. But the idea is that the extra analytical
capability helps us find problems more quickly, and in a more reproducible way,
and also helps detect problems that are subtler.

I'd like to have some querying ability about the behaviors. I want to be able to
ask things like, "which behaviors contain a lot of silhouette computation?",
then look at those behaviors as a group to understand them better. Perhaps I
want to perform queries with boolean operations, like "which behaviors contain a
lot of 'collision_detect' and NOT a lot of 'physics_simulate'?" This would
indicate unusual situations where objects collided a lot without moving far
between collisions, which is a behavior case we might want to spend a lot of
effort to alleviate.

If I am investigating a certain behavior, I may wish to ask "which behaviors are
like this one?", to find other cases that may shed light on the current
situation; or I may wish to ask "which behaviors are very much unlike this
one?", to see how far the game's behavior can deviate from what I am studying.

To discover performance patterns, I want the profiler to detect correlations
between timing values. Suppose time spent in the dynamic texture generator is
roughly proportional to time spent drawing particle systems, and both of those
correlate significantly with increased frame time. That would indicate that the
art guys are using too many procedurally generated textures in the particle
systems, and they need to cut down on those; but perhaps the other dynamically
generated textures do not represent a performance problem (say, the ones used
for rippling water).

Since the profiler is built into the game, we can expose game variables to the
profiler. So instead of just classifying behaviors based on timing data, it can
also look at "the number of monsters currently running around" or "the number of
nodes in the potentially visible set". It can find correlations between these
variables and the timing data, or perhaps just among the variables themselves.
With that capability, we have a powerful tool tremendously different from
anything we've had yet.

Next Month

This month's article has consisted basically of a big wish list. Next month, we
will look very concretely at the implementation of an interactive profiler that
attempts to satisfy these wishes.