Listen. Reflect. Explore. Solve.

Post navigation

Devel::DProf – broken by the passage of time

To measure the performance of your Perl code many developers instinctively reach for Devel::DProf. The venerable perl profiler dates back to 1995.

When profiling you have two broad choices: cpu-time vs wallclock time, and subroutine-level profiling vs line-level profiling. DProf is a subroutine-level profiler. (I’ll talk more about subroutine-level profiling vs line-level profiling in an upcoming post about the soon-to-be-releasedNYTProf 2.0.)

Measuring cpu-time should, in theory, give you a clear picture of the time spent by perl executing your code, independent of the load on the machine or delays waiting for i/o. Sadly life isn’t so simple.

The big problem is that on most systems cpu-time measurement has a resolution of 0.01 seconds (a HZ value of 100). That may not sound like much time, but you can execute a lot of perl code, and a lot of subroutine calls, in that time.

So any profiler using cpu-time with such a course granularity is going to give ‘noisy’ results. Some subroutines may appear to be slow because the cpu-time ‘tick’ happened to occur more while those subs were running. Other subroutines may appear to be fast because the cpu-time ‘tick’ doesn’t happen much while they’re running.

Clearly nonsense. Out of 1000 subroutines that would all take about the same length of time, a cpu-time ‘tick’ happened to occur in those three. The rest of the subroutines appear to be ‘free’. (If you play with this yourself you’ll probably need to adjust the 1..100 to see the effect on your system.)

Using dprofpp -r (to “display elapsed real times” instead of cpu-times) gives a very similar granular result. That’s because Devel::DProf artificially reduces the resolution of real time to match the granular cpu-time! It’s also odd that dprofpp doesn’t provide an option to sort by elapsed time, but there’d be little value anyway given the low resolution.

Many people have reported strange results from Devel::DProf over the years. Design decisions that made sense back in 1995 are now causing problems as CPUs have got so much faster. Several have implemented alternative profilers.

Using cpu-time is effectively a kind of statistical sampling. That’s fine as long as you remember it and interpret the results accordingly. The work-around for the course granularity is to execute the code for long enough for the effects to be averaged out. But that can be inconvenient, and how do you know how much is enough?

Personally I’d recommend using wallclock time for profiling. I want to know how much time was spent waiting for network, waiting for disk, waiting for cpu. Those are often significant factors in performance issues. Too many network round trips, or too many individual disk requests, for example, can easily kill performance. If you only profile cpu time you may not spot these kinds of problems. You may just be chasing the random ghosts of occasional cpu clock ticks.

So which profiler would I recommend? Stay tuned for an announcement in a day or so. Easy! NYTProf 2.0