How to run existing code under the profiler

It's worth noting that the profiler will slow down the code you're running, so expect a slow program to get slower while you're trying to figure out why it's so slow. Also, the raw output files can get very large, especially if your program runs a long time.

How to get the results of profiling

With profiling turned on, a file called "tmon.out" will be created in the current directory. That file contains all the raw timing information for your run. To get a condensed report, run

This is wall clock time from the program's start to finish, no matter how that time was spent.

User+System Time

This is how much CPU time the program took. This does not include time spent waiting on the disk, the network, or other tasks. ("User time" is time spent in your code and "system time" is time the operating system spent serving your code.)

%Time

This is the percent of "User+System Time" spent exclusively in the specified function. It does not include time spent in functions that this function called.

ExclSec (exclusive seconds)

Time spent just in this function, not in functions it called.

CumulS (cumulative seconds)

Time spent in this function, including time spent in functions that it called.

#Calls (number of calls)

Number of times the function was called.

sec/call (seconds per call)

The average number of seconds (in user+system time) spent in each call to this function.

Csec/c (cumulative seconds per call)

The average number of seconds spent in each call to this function, including time spent in the functions it called.

Notice that in the ExclSec column, all_the_work takes most of the time, and main takes so little time, it's not even noted. However, in CumulS, both of them take most of the time. The time spent in all_the_work is included in the cumulative time spent in main.

Notice that the elapsed time is a lot higher than the user+system time. Notice also that the elapsed time doesn't show up in any of the other statistics. The numbers for do_nothing do not show that five seconds passed while it was "running."

When there's a large gap between elapsed and user+system time like that, it is often useful to run dprofpp with the -r option. Here's the same raw data used above, but displayed with dprofpp -r:

With the -r option, dprofpp reports elapsed times for all functions instead of just the CPU time. With that, it's easy to see that most of the real time was spent in do_nothing().

What causes differences between elapsed time and user+system time? This can be anything that your program waits on when it's not actively running. Typically this is one of:

Network connections

Disk access

Other processes running on the same machine

Database access (which is often a combination of the above)

Other results available via dprofpp

The dprofpp tool has options to make its output more useful in different situations. By default it will show you the top 15 functions by user+system time. With the -g option, you can have dprofpp focus on one function. With the -O option, you can have it show any number of functions.

When profiling anything that spends a lot of time on I/O, like DBI or disk access, you should look at the real time with the -r flag. CPU time will lead you off in the wrong direction when CPU is not the bottleneck.

You don't mention when to use one profiler over another. My rule of thumb is to try Devel::Dprof first. If that core-dumps a few times in a row (it can do that, depending on your code) then I try Devel::Profiler.

Another module worth pointing out is Devel::FastProf. The two main profilers profile by sub/method call. If you've inherited a nasty monolithic script FastProf will give you profiling by file line.

As far as CGI goes, I tried it out on a personal web project and all I had to do was give global write permissions to the directory where Apache ran my scripts (chmod 777 /path/to/my/scripts). Only on a dev machine, of course; I wouldn't want to do that on a production box.