Friday, May 26, 2006

Logging and Divide and Conquer for Performance

There are only two true debugging techniques: log intermediate results and divide and conquer the buggy code. They work well for performance tuning too.

For performance-tuning X-Plane our primary tool is profiling. Some of this is done by manually logging timing points (read off high-fidelity counters). But more useful is Shark, Apple’s adptive sampling profiler.

One of the trickier things about performance-tuning an OpenGL application is that its speed is affected by both CPU and GPU. The nice thing about Shark is that since it samples over time (and not per function call), our framerate doesn’t decrease when we use it. If our framerate decreased, the ratio of CPU to GPU work would change and the profile would be invalid. Also, Shark can sample within a function, which is crucial since we inline very heavily in our tight loops.

Good profiling is critical to performance; we can make almost anything fast but we don’t have time to make everything fast. And what’s slow is rarely what you would think is slow. For example, I just did a profile of 8000 cars to determine whether we can render the full 3-d headlights and taillights at a distance. Surprisingly, the cost of setting up the lights in 3-d is almost nil; assuring that the car is not unnecessarily drawn turns out toe be the performance-critical factor. (Given how many more lights there are than cars, since cars themselves are culled out when far away, the fact that the 3-d math isn’t a hot loop is surprising!)

In this picture you can see a Shark profile of X-Plane where we’re pushing back a lot of items onto a vector that hasn’t been pre-alloated. Thus OS vm functions and reserve() take up 59% of CPU usage and represent all dominating function calls. In a healthy X-Plane, plot_group_layers (our scene-graph iterator) and the various GLEngine calls would dominate.

One problem with profiling is that if you can’t duplicate the exact rendering settings, you can’t safely compare techniques. To determine the cost of a feature, we divide and conquer. For example, to understand what really costs us - the car or the headlight, I can set X-Plane to only draw the cars when the mouse is in the to half of the screen and only draw the headlights when the mouse is on the right side of the screen. This kind of technique lets us see the instantanious performance change for a feature, giving us a differential under the exact same conditions (same number of cars, same number of cars on screen, same distance away…). This is the ultimate confirmation that a feature costs or doesn’t cost us.