Inside the second: A new look at game benchmarking

I suppose it all started with a brief conversation. Last fall, I was having dinner with Ramsom Koay, the PR rep from Thermaltake. He's an inquisitive guy, and he wanted to know the answer to what seemed like a simple question: why does anyone need a faster video card, so long as a relatively cheap one will produce 30 frames per second? And what's the deal with more FPS, anyway? Who needs it?

I'm ostensibly the expert in such things, but honestly, I wasn't prepared for such a question right at that moment. Caught off guard, I took a second to think it through and gave my best answer. I think it was a good one, as these things go, with some talk about avoiding slowdowns and maintaining a consistent illusion of motion. But I realized something jarring as I was giving it—that the results we provide our readers in our video card reviews don't really address the issues I'd just identified very well.

That thought stuck with me and began, slowly, to grow. I was too busy to do much about it as the review season cranked up, but I did make one simple adjustment to my testing procedures: ticking the checkbox in Fraps—the utility we use to record in-game frame rates—that tells it to log individual frame times to disk. In every video card review that followed, I quietly collected data on how long each frame took to render.

Finally, last week, at the end of a quiet summer, I was able to take some time to slice and dice all of the data I'd collected. What the data showed proved to be really quite enlightening—and perhaps a bit scary, since it threatens to upend some of our conclusions in past reviews. Still, I think the results are very much worth sharing. In fact, they may change the way you think about video game benchmarking.

Why FPS fails
As you no doubt know, nearly all video game benchmarks are based on a single unit of measure, the ubiquitous FPS, or frames per second. FPS is a nice instant summary of performance, expressed in terms that are relatively easy to understand. After all, your average geek tends to know that movies happen at 24 FPS and television at 30 FPS, and any PC gamer who has done any tuning probably has a sense of how different frame rates "feel" in action.

Of course, there are always debates over benchmarking methods, and the usual average FPS score has come under fire repeatedly over the years for being too broad a measure. We've been persuaded by those arguments, so for quite a while now, we have provided average and low FPS rates from our benchmarking runs and, when possible, graphs of frame rates over time. We think that information gives folks a better sense of gaming performance than just an average FPS number.

Still, even that approach has some obvious weaknesses. We've noticed them at times when results from our FRAPS-based testing didn't seem to square with our seat-of-the-pants experience. The fundamental problem is that, in terms of both computer time and human visual perception, one second is a very long time. Averaging results over a single second can obscure some big and important performance differences between systems.

To illustrate, let's look at an example. It's contrived, but it's based on some real experiences we've had in game testing over the years. The charts below show the times required, in milliseconds, to produce a series of frames over a span of one second on two different video cards.

GPU 1 is obviously the faster solution in most respects. Generally, its frame times are in the teens, and that would usually add up to an average of about 60 FPS. GPU 2 is slower, with frame times consistently around 30 milliseconds.

However, GPU 1 has a problem running this game. Let's say it's a texture upload problem caused by poor memory management in the video drivers, although it could be just about anything, including a hardware issue. The result of the problem is that GPU 1 gets stuck when attempting to render one of the frames—really stuck, to the tune of a nearly half-second delay. If you were playing a game on this card and ran into this issue, it would be a huge show-stopper. If it happened often, the game would be essentially unplayable.

The end result is that GPU 2 does a much better job of providing a consistent illusion of motion during the period of time in question. Yet look at how these two cards fare when we report these results in FPS:

Whoops. In traditional FPS terms, the performance of these two solutions during our span of time is nearly identical. The numbers tell us there's virtually no difference between them. Averaging our results over the span of a second has caused us to absorb and obscure a pretty major flaw in GPU 1's performance.

Let's say GPU 1 had similar but slightly smaller delays in other places during the full test run, but this one second was still its worst overall. If so, GPU 1's average frame rate for the whole run could be upwards of 50 FPS, and its minimum frame rate would be 35 FPS—quite decent numbers, according to the conventional wisdom. Yet playing the game on this card might be almost completely unworkable.

If we saw these sorts of delays during our testing for a review, we'd likely have noted the occasional hitches in GPU 1's performance, but some folks probably would have simply looked at the numbers and flipped to the next game without paying attention to our commentary. (Ahem.)

Frame time
in milliseconds

FPS
rate

8.3

120

10

100

16.7

60

20

50

25

40

33

30

42

24

50

20

60

17

70

14

By now, I suspect you see where we're headed. FPS isn't always bad as a summary of performance, but it has some obvious shortcomings due to the span of time involved. One way to overcome this weakness is to look inside the second, as we have just done, at the time it takes to produce individual frames. Doing so isn't all that difficult. Heck, game developers have done it for years, tuning against individual frame times and also delving into how much GPU time each API call occupies when producing a single frame.

We will need to orient ourselves to a new way of thinking, though. The table on the right should help. It shows a range of frame times in milliseconds and their corresponding FPS rates, assuming those frame times were to remain constant over the course of a full second. Notice that in the world of individual frame times, lower is better, so a time of 30 ms is more desirable than a time of 60 ms.

We've included several obvious thresholds on the table, among them the 16.7 ms frame time that corresponds to a steady rate of 60 frames per second. Most LCD monitors these days require input at 60 cycles per second, or 60Hz, so going below the 16.7-ms threshold may be of limited use for some folks.

With that said, I am not a believer in the popular myth that speeds above 60 FPS are pointless. Somehow, folks seem to have conflated the limits of current display technologies (which are fairly low) with the limits of the human visual system (which are much higher). If you don't believe me, you need only to try this simple test. Put two computers side by side, one with a 60Hz display and the other with a 120Hz display. Go to the Windows desktop and drag a window around the screen on each. Wonder in amazement as the 120Hz display produces an easily observable higher fluidity in the animation. In twitch games, steady frame rates of 90Hz or higher are probably helpful to the quickest (and perhaps the youngest) among us.

At the other end of the scale, we have the intriguing question of what sorts of frame times are acceptable before the illusion of motion begins to break down. Movies in the theater are one of the slower examples we have these days, with a steady frame rate of just 24 FPS—or 42 ms per frame. For graphical applications like games that involve interaction, I don't think we'd want frame times to go much higher than that. I'm mostly just winging it here, but my sense is that a frame time over 50 ms is probably worthy of note as a mark against a gaming system's performance. Stay above that for long, and your frame rate will drop to 20 FPS or lower—and most folks will probably start questioning whether they need to upgrade their systems.

With those considerations in mind, let's have a look at some frame time data from an actual game, to see what we can learn.