5/20/2009

I'll be honest: I love Perl and I use it everywhere. I've written Web sites, administration scripts, and games using Perl. I frequently save time by getting Perl to do and check things automatically for me, everything from my lottery numbers to the stock markets, and I even use it to automatically file my e-mail. Because Perl makes it so easy to do all of these things, there's a tendency to forget about optimization. In many cases this isn't the end of the world. So what if it takes an extra few milliseconds to look up your stock reports or parse those log files?

However, those same lazy habits that cost milliseconds in a small application are multiplied when dealing with larger scale development projects. It's the one area where the Perl mantra of TMTOWTDI (There's More Than One Way To Do It) starts to look like a bad plan. If you need speed, there may be only one or two ways to achieve the fastest results, whereas there are many slower alternatives. Ultimately, sloppy programming -- even if you achieve the desired result -- is going to result in sloppy performance. So, in this article I'm going to look at some of the key techniques you can use to squeeze those extra cycles out of your Perl application.

First of all, it's worth remembering that Perl is a compiled language. The source code you write is compiled on the fly into the bytecode that is executed. The bytecode is itself based on a range of instructions, all of which are written in a highly optimized form of C. However, even within these instructions, some operations that can achieve similar results are more highly optimized than others. Overall, this means that it's the combination of the logic sequence you use and the bytecode that is generated from this that ultimately affects performance. The differences between certain similar operations can be drastic. Consider the code in Listings 1 and 2. Both create a concatenated string, one through ordinary concatenation and the other through generating an array and concatenating it with join.

Running Listing 1, I get a time of 1.765 seconds, whereas Listing 2 requires 5.244 seconds. Both generate a string, so what's taking up the time? Conventional wisdom (including that of the Perl team) would say that concatenating a string is a time-expensive process, because we have to extend the memory allocation for the variable and then copy the string and its addition into the new variable. Conversely, adding a string to an array should be relatively easy. We also have the added problem of duplicating the string concatenation using join(), which adds an extra second.

The problem, in this instance, is that push()-ing strings onto an array is time-intensive; first of all, we have a function call (which means pushing items onto a stack, and then taking them off), and we also have the additional array management overhead. In contrast, concatenating a string is pretty much just a case of running a single opcode to append a string variable to an existing string variable. Even if we set the array size to alleviate the overhead (using $#concat = 999999), we still only save another second.

The above is an extreme example, and there are times when using an array will be much quicker than using strings; a good example here is if you need to reuse a particular sequence but with an alternate order or different interstitial character. Arrays are also useful, of course, if you want to rearrange or reorder the contents. By the way, in this example, an even quicker way of producing a string that repeats the alphabet 999,999 times would be to use:

$concat = 999999 x 'abcdefghijklmnopqrstuvwxyz';

Individually, many of the techniques covered here won't make a huge difference, but combined in one application, you could shave a few hundred milliseconds, or even seconds, off of your Perl applications.

If you work with large arrays or hashes and use them as arguments to functions, use a reference instead of the variable directly. By using a reference, you tell the function to point to the information. Without a reference, you copy the entire array or hash onto the function call stack, and then copy it again in the function. References also save memory (which reduces footprint and management overheads) and simplify your programming.

If you are using static strings in your application a lot -- for example, in a Web application -- remember to use single quotes rather than doubles. Double quotes force Perl to look for a potential interpolation of information, which adds to the overhead of printing out the string:

print 'A string','another string',"\n";

I've also used commas to separate arguments rather than using a period to concatenate the string first. This simplifies the process; print simply sends each argument to the output file. Concatenation would concatenate the string and print it as one argument.

As you've already seen, function calls with arguments are expensive, because for the function call to work, Perl has to put the arguments onto the call stack, call the function, and then receive the responses back through the stack again. All of this requires overhead and processing that we could probably do without. For this reason, excessive function calls in a loop are generally a bad idea. Again, it comes down to a comparison of numbers. Looping through 1,000 items and passing information to a function will trigger the function call 1,000 times. To get around this, I just switch the sequence around. Instead of using the format in Listing 3, I use the approach in Listing 4.

Another common operation related to loops is sorting information, particularly keys in a hash. It's tempting in this instance to embed some processing of list elements into the sort operation, such as the one shown here in Listing 5.

This is a fairly typical sort of complex data, in this case ordering something by date, time, and ID number by concatenating the numbers into a single number that we can then sort numerically. The problem is that the sort works through the list of items and moves them up or down through the list based on the comparison operation. In effect, this is a type of loop, but unlike the loop examples we've already seen, a sprintf call has to be made for each comparison. That's at least twice for each iteration, and the exact number of iterations through the list will depend how ordered it was to begin with. For example, with a 10,000-item list you could expect to call sprintf over 240,000 times.

The solution is to create a list that contains the sort information, and generate the sort field information just once. Taking the sample in Listing 5 as a guide, I'd rewrite that fragment into something like the code in Listing 6.

Instead of calling sprintf all those times, we call it just once for each item in the hash in order to generate a sort field in the hash, and then use that sort field directly during the sort. The sorting process only has to access the sort field's value. You have cut down the calls on that 10,000-item hash from 240,000 to just 10,000. It depends on what you are doing in that sort section originally, but it's possible to save as much as half the time it would take using the method shown in Listing 6.

If you produce these hashes through results from a database query -- through MySQL or similar -- using sorting within the query and then recording the order as you build the hash, you won't need to iterate over the information again.

Aside from the waste of space in terms of sheer content, there are a couple of problems with this structure. From a programming perspective, it has the issue that it never checks if any of the variables have a valid value, a fact that would be highlighted if warnings were switched on. Second, it has to check each option until it gets to the one it wants, which is wasteful, as comparison operations (particularly on strings) are time consuming. Both problems can be solved by using short circuit logic.

If you use the logical || operator, Perl will use the first true value it comes across, in order, from left to right. The moment it finds a valid value, it doesn't bother processing any of the other values. In addition, because Perl is looking for a true value, it also ignores undefined values without complaining about them. So we can rewrite the above into a single line:

$realchoice = $userchoice || $systemchoice || $defaultchoice;

If $userchoice is a true value, Perl doesn't even look at the other variables. If $userchoice is false (see Table 1), then Perl checks the value of $systemchoice and so on until it gets to the last value, which is always used, whether it's true or not.

One of the most expensive portions of the execution of a Perl script is the compilation of source code into the bytecode that is actually executed. On a small script with no external modules, the process takes milliseconds. But start to include a few of your own external modules and the time increases. The reason is that Perl does little more with a module than importing the text and running it through the same compilation stage. That can turn your 200 line script into a 10,000 or 20,000 line script very quickly. The result is that you increase the initial stages of the compilation process before the script even starts to do any work.

During the normal execution of your script, it may be that you only use 10 percent, or even 5 percent, of all the functions defined in those modules. So why load them all when you start the script? The solution is to use AutoLoader, which acts a bit like a dynamic loader for Perl modules. This uses files generated by the AutoSplit system, which divides up a module into the individual functions. When you load the module through use, all you do is load the stub code for the module. It's only when you call a function contained within the module that the AutoLoader steps in and then loads and compiles the code only for that function. The result is that you convert that 20,000 line script with modules back into a 200-line script, speeding up the initial loading and compilation stages.

I've saved as much as two seconds just by converting one of my applications to use the AutoLoader system in place of preloading. It's easy to use by just changing your modules from the format shown in Listing 8 to that shown in Listing 9, and then making sure to use AutoSplit to create the loading functions you need. Note that you don't need to use Exporter any more; AutoLoader handles the loading of individual functions automatically without you have to explicitly list them.

The main difference here is that functions you want to autoload are no longer defined within the module's package space but in the data section at the end of the module (after the __END__ token). AutoSplit will place any functions defined here into the special AutoLoader files. To split up the module, use the following command line:

There are three ways to use the compiler: bytecode production, full compilation, or simply as a debugging/optimizing tool. The first two methods rely on converting your original Perl source into its compiled bytecode form and storing this precompiled version for execution. This is best used through the perlcc command. These two modes follow the same basic model but produce the final result differently. In bytecode mode, the resulting compiled bytecode is written out to another Perl script. The script consists of the ByteLoader preamble, with the compiled code stored as a byte string. To create this bytecode version, use the -B option to the perlcc command. For example:

$ perlcc -B script.pl

This will create a file, a.out. The output, however, is not very Web friendly. The resulting file can be executed with any Perl executable on any platform (Perl bytecode is platform independent):

$ perl a.out

What this does is save Perl from having to compile the script from its source code into the bytecode each time. Instead, it just runs the bytecode that was generated. This is similar to the process behind Java compilation and is in fact that same one-step away from being a truly compiled form of the language. On short scripts, especially those that use a number of external modules, you probably won't notice a huge speed increase. On larger scripts that "stand alone" without a lot of external module use, you should see a noticeable improvement.

The full compilation mode is almost identical, except that instead of producing a Perl script with the compiled bytecode embedded in it, perlcc produces a version embedded into C source that is then compiled into a full-blown, standalone executable. This is not cross-platform compatible, but it does allow you to distribute an executable version of a Perl script without giving out the source. Note, however, that this doesn't convert the Perl into C, it just embeds Perl bytecode into a C-based application. This is actually the default mode of perlcc, so a simple: $ perlcc script.pl will create, and compile, a standalone application called a.out.

One of the lesser-known solutions for both debugging and optimizing your code is to use the Perl compiler with one of the many "back ends."

The back ends are actually what drive the perlcc command, and it's possible to use a back-end module directly to create a C source file that you can examine. The Perl compiler works by taking the generated bytecode and then outputting the results in a variety of different ways. Because you're looking at the opcodes generated during the compilation stage, you get to see the code after Perl's own internal optimizations have been applied. Providing you know the Perl opcodes, you can begin to identify where the potential bottlenecks might be. From a debugging perspective, go with back ends such as Terse (which is itself a wrapper on Concise) and Showlex. You can see in Listing 10 what the original Listing 1 looks like through the Terse back end.

What I've covered here looks entirely at the code that makes up your applications. While that's where most of the problems will be, there are tools and systems you can use that can help identify and locate problems in your code that might ultimately help with performance.

It's a common recommendation, but it really can make a difference. Use the warnings and strict pragmas to ensure nothing funny is going on with variable use, typos, and other inconsistencies. Using them in all your scripts will help you eliminate all sorts of problems, many of which can be the source of performance bottlenecks. Common faults picked up by these pragmas are ambiguous references and de-references, use of undefined values, and some help identifying typos for unused or undefined functions.

All of this help, though, comes at a slight performance cost. I keep warnings and strict on while programming and debugging, and I switch it off once the script is ready to be used in the real world. It won't save much, but every millisecond counts.

Profiling is a useful tool for optimizing code, but all it does is identify the potential location of the problem; it doesn't actually point out what the potential issue is or how to resolve it. Also, because profiling relies on monitoring the number of executions of different parts of your application it can, on occasion, give misleading advice about where a problem lies and the best approach for resolving it.

However, profiling is still a useful, and often vital, part of the optimization process. Just don't rely on it to tell you everything you need to know.

To me, a badly optimized program means that it has a bug. The reverse is also true: bugs often lead to performance problems. Classic examples are badly de-referenced variables or reading and/or filtering the wrong information. It doesn't matter whether your debugging technique involves using print statements or the full-blown debugger provided by Perl. The sooner you eliminate the bugs, the sooner you will be able to start optimizing your application.

Now that you know the techniques, here is the way to go about using them together to produce optimized applications. I generally follow this sequence when optimizing:

Write the program as optimized as possible using the techniques above. Once you start to use them regularly, they become the only way you program.

Once the program is finished or at least in a releasable state, go through and double check that you are using the most efficient solution by hand by reading the code. You'll be able to spot a number of issues just by re-reading, and you might pick up a few potential bugs, too.

Debug your program. Bugs can cause performance problems, so you should always eliminate the bugs first before doing a more intense optimization.

Run the profiler. I always do this once on any serious application, just to see if there's something -- often obvious -- that I might have missed.

Go back to step 1 and repeat. I've lost track of the number of times I've completely missed a potential optimization the first time around. Either I'll go back and repeat the process two or three times in one session, or I'll leave, do another project, and return a few days, weeks, or months later. Weeks and months after, you'll often have found an alternative way of doing something that saves time.

At the end of the day, there is no magic wand that will optimize your software for you. Even with the debugger and profiler, all you get is information about what might be causing a performance problem, not necessarily any helpful advice on what you should do to fix it. Be aware as well that there is a limit to what you can optimize. Some operations will simply take a lot of time to complete. If you have to work through a 10,000-item hash, there's no way of simplifying that process. But as you've seen, there might be ways of reducing the overhead in each case.