Profilers identify bottlenecks in Perl programs

Run-Time Speed

Instead of souped up CPUs, simple programming tricks are often all it takes to speed up a program. Profilers can discover bottlenecks that need more TLC from the developer.

If you don't spend all day playing video games, you will be quite happy without the latest and greatest CPU, and you can save money on hardware while saving power and reducing your personal CO² footprint.

But what happens if a Perl program does not perform as well as you would like? As a first step, you might be able to accelerate the program with very little effort by optimizing the code at neuralgic points.

In most cases, just one or two targeted improvements will help you achieve 90 percent of the total optimization potential. The remaining 10 percent are more likely to involve wide-ranging, architectural changes that take 10 times as long to implement and make the resulting program so difficult to maintain that more experienced developers are likely to turn down the option.

Profilers are programs that discover bottlenecks, or hotspots. They show you where the program spends most of its time and help to optimize these crucial sections in a targeted way.

Perl has many tools of this kind. For example, if a module uses the Log4perl logging framework, it is simple to display the number of milliseconds (msec) that elapse along with the log messages, thus creating a timescale.

Finding Bottlenecks

The simple test script amzntest (Listing 1) uses the CPAN Net::Amazon module to retrieve book details from the Amazon Web API (assuming you enter a valid token, which you can get from Amazon Web Services [2]).

A request takes about a second, but what does the script use this time for – picking up the data or the analyzing the XML response? Adding

use Log::Log4perl;
Log::Log4perl->init("timer.l4p");

to the script header enables the Log4perl statements embedded in Net::Amazon by loading the timer.l4p configuration file. This outputs the date (%d) and the number of milliseconds since the program started (%r) for each message it logs. The logging level is set to DEBUG and is thus fairly verbose.

To cut down on lengthy log messages, the configuration file in Listing 2 uses the abbreviation %S to define a so-called Custom Cspec. This appears in the layout, instead of the more typical %m (for the log message).

The associated Perl subroutine returns a shorter, 25-character message to the screen appender. The layout closes with a platform-dependent newline character defined by %n.

Quick Fix

Figure 1 shows that Net::Amazon sends a web request to Amazon 79msec after the program launches; however, the XML parser does not start working until 739msec have elapsed. Thus, the web request takes 75 percent of the 800msec required by the script.

Figure 1: Log4perl outputs the time of day and the number of milliseconds (79msec) that have elapsed since the program was launched.

As you will agree, it makes sense to use a web cache for frequently retrieved ASIN numbers, and Net::Amazon enables this if we ask it to (see Listing 3). This really does reduce the script run time, which drops to 180msec for repetitions (see Figure 2) with the use of File::Cache as a persistent cache.

Figure 2: The use of a web cache removes the need to issue a web request in case of repetitions, thus reducing the run time by 75 percent.

Quick Results

Now you might argue that tricks of this kind aren't always going to help, but the decisive issue here is that five lines of code and a minute spent thinking about options achieved a speed boost of 400 percent.

SmallProf Line Profiler

Of course, some modules do not have very sophisticated logging mechanisms. Thankfully, Perl has a number of profilers that measure the amount of time each line of source code uses.

Table 1 shows the major CPAN profilers for various tasks in ascending release order.

The CPAN Devel::SmallProf line profiler autonomously measures run times and formats the data for analysis. If you launch the script you want to test via perl -d:SmallProf ./amzn, the profiler creates a smallprof.out file that lists the time used by each line in each module used.

In the case of a fairly complex action like a web request and ensuing XML analysis of the returned data, the resultant file can be enormous – in this case, it weighed in at no fewer than 25,792 lines.

Discovering the worst time hogs is not an easy task, but the shell command

sort -k 2,2 smallprof.out | less

sorts the file numerically on the second field from the left. This field contains the wall clock time in seconds used by a line of source code, which is the time actually used whether the task gave the CPU something to do or just left it idle, waiting for external events, such as incoming network packages, for example. In addition, the third field contains the CPU seconds, which is the computational time actually used.

If you scroll down to line 17,104 in smallprof.out, as shown in Figure 3, you will see that the line responsible for the wait issues a select command. The can_read function in the LWP::Protocol::http::SocketMethods module is responsible for this.

Figure 3: The 25,000 lines of output from SmallProf reveals that waiting for the Amazon server takes longest.

NYTProf

The CPAN module Devel::NYTProf is a relatively new development [3]. The strange-sounding name came about because the module was developed for the New York Times and based on the Devel::FastProf codebase. Their IT department decided to release the source code.

This excellent profiler is maintained by DBI inventor Tim Bunce [4] today. Bunce presented his work to the Perl community at OSCON 2008 [5]. Although I was in the audience, I couldn't concentrate on his talk because I was up next with my Log4perl talk.

After the CPAN shell-assisted installation (perl -MCPAN -e'install Devel::NYTProf'), you type perl -d:NYTProf amzn to launch the profiler on the script. To convert its binary logfile nytprof.out to professionally formatted HTML, use the nytprofhtml script included with the package.

Firing up a browser and pointing it to the index.html file in the newly created nytprof directory (URL file:///...nytprof/index.html) reveals what has been going on. The table in Figure 4 lists the hotspots. It shows the number of calls, the number of places (P) they have been issued, the files (F) they are located in, and the time spent executing each function. The tool differentiates between "exclusive time" and "inclusive time," the former referring to the time spent exclusively in the function's code and the latter referring to the total time, including the subfunctions called by the function.

Figure 4: The 15 functions that waste the most time.

Single-Click Analysis

The HTML links the function names with HTML pages providing details on the function code – a very practical idea that lets the user flip back and forth and greatly facilitates the analysis. Figure 5, for example, shows details that were very difficult to obtain with SmallProf; that is, the select() command in the can_read() function in an LWP module is causing the delays. On an open network socket, it waits for the first signs of the response returned by Amazon.

Figure 5: The select command in the LWP code waits for a response from Amazon and is thus responsible for a major part of the program run time.

NYTProf supports three different reporting modes for displaying the time spent by the source code: line mode (one time value per line), block mode (one time value per block of Perl), and sub mode (one time value per function). Clicking toggles the performance de-tective through the different display variants.

The module-level view of the profiler in Figure 6 also reveals some interesting facts. The XML analysis of the Amazon web response took 9,963 lines of code and 400 calls to the collapse() function XML::Simple.

Figure 6: The NYTProf shows how much time the code spends in each module and how many statements were processed.

Because it ran at full speed, the whole kit and caboodle took just 35msec, but this just goes to show how complex it is to parse XML.

Not Without Side Effects

Just like any other profiler, Devel::NYTProf also involves some overhead that can completely falsify the measured results in some cases. The profiler's activities are particularly invasive if a program is not waiting for external events, such as network traffic or disk access, which are magnitudes of scale slower.

For a program running at full CPU speed, the run time can be 10 times longer if you enable the profiler.

Figure 7 shows the effect the NYT profiler has on the short l4ptest test program (Listing 4). The program configures Log4perl for the $INFO logging priority and then issues $DEBUG messages, which should be suppressed because of their lower priority. Log4perl has optimized this case; after all, a disabled logging system shouldn't have any measurable effect.

Figure 7: NYTProf can slow down fast programs – considerably in some cases.

The script running without the profiler achieves around 100,000 calls in about 100msec; the script run time is about 10 times longer with NYTProf enabled.

This by no means detracts from the quality of the profiler, but it is important to take this into consideration when measuring script run time and interpreting the results.

The module also is quite at home in the Apache server's mod_perl. Adding PerlModule Devel::NYTProf::Apache to the configuration loads the module and tells it to append profile data to the /tmp/nytprof.$$.out file for incoming requests, where $$ represents the PID for the Apache process handling the request.

Calling nytprofhtml again creates a collection of web pages for in-depth analysis of the web application's performance that could very well point you to critical code sections that need improvement.

Michael Schilli works as a software engineer with Yahoo! in Sunnyvale, California. He is the author of Goto Perl 5 (German) and Perl Power (English), both published by Addison-Wesley, and he can be contacted at mailto:mschilli@perlmeister.com. Michael's homepage is at http://perlmeister.com.