Perl: Debunking the Speed Myth

Interpreted and scripted languages are often thought to be speed deficient, especially when compared to the compiled languages, assembly, and machine code. But, Perl may not be an interpreted language in the way you think it is. Perl actually compiles itself into a highly optimized language before execution. Compared to many scripting languages, this makes Perl's execution almost as fast as compiled C code. Perl's built-in functions, such as sort and print, are nearly as fast as their C counterparts.

Of course, Perl has difficulty competing when it comes to things like threads or massive computation. However, Perl is also speedy in completely different ways. Perl can be written and edited quickly, it can be easy and quick to learn, and it can also be easy and quick to read (especially if you follow the Perl style guidelines).

But, when we talk about speed and languages, we are usually talking about the speed of program execution. Add to program execution memory usage and now we are gauging performance. Often, the two are mutually exclusive, and you can increase speed by increasing the RAM footprint, or decrease RAM and slow the execution down. Luckily, Perl comes with a few good tools for determining how quickly your script executes and how much memory it uses.

Finding the Bottleneck in Your Perl Script

First, you need to figure out where your program is slow; otherwise, you won't be sure where to put your investment for increasing performance. You need to isolate what needs to be fixed, whether it's program flow, expressions, the structure, or how variables are used.

Perl's built-in benchmark module can time single statements, routines, or even entire scripts. Using the benchmark mod is pretty easy. You tell Perl you want to use Benchmark, use its timethis function, and iterate how many times to run:

The more times benchmark runs the test, the more accurate the benchmark usually is. You'll likely want to run it several hundred thousand times to notice slight differences between functions.

Also included in Perl 5.6 or higher (and at CPAN if you are utilizing something else) is a Perl code profiler called the Devel::DProf package. DProf collects information on the execution time of a Perl script and its subroutines. When you run your program using DProf, DProf checks the execution timing and then stores data in a file named tmon.out. You can then run dprofpp to process the tmon.out file. This helps you to determine which parts of your script use how much CPU time.

For Web servers and folks using Perl with CGI, there is an Apache::Dprof module that runs a Devel::DProf inside each child server. There also is the Apache::Status mod that can allow you to find the size of use in memory of each subroutine (although you won't want to run these on your actual in production server because they will add a lot of overhead to each call or request).

Finally, don't forget your OS. When running in a Posix environment, you can use shell tools such as top and ps to figure out which processes are the most active at any moment. Windows Task Manager is also capable of tracking all sorts of useful information where CPU use is concerned.

Reducing Program Execution Time

The beauty of Perl is that there is always another way to do something (TMTOWTDI). One simple way to speed up your scripts is to find the bloat using Dprof, isolate the slow parts of the code, and then use Benchmark to track speed. Then, you can try benchmarking alternate Perl code until you find a quicker solution.

The problem with this technique is that it may be difficult to optimize for speed after the fact. In C, you can often wait until the end of a project and then optimize bits of code in chunks. In Perl, you might have to focus on speed in the design, because after-the-fact optimization often leads to a lot of re-writing of major sections of code. Here are a few commonly used tips for optimizing Perl for speed:

Tip 1: Hashes are fastest

Use arrays instead of lists of individual variables when you can, and use hashes instead of arrays for lists when you can. Lists are better handled by hashes than arrays in Perl because the hash algorithms have already been optimized.

Tip 2: Be careful with your loops

There are a few things you'll want to avoid using in loops. A general programming speed trick is to avoid system calls when looping. The eval command starts up its own interpreter and reruns with each iteration, so you'll want to avoid placing evals in loops. Complex subroutines within loops also can slow processes because Perl has to keep copying the arguments to the stack.

When inside a loop, place your control statements early on. Don't wait to tell Perl until the very end of a loop, after it has iterated through all of the code to exit if it can possibly exit earlier.

Finally, don't use loop warnings as an excuse to pull out the goto command. Perl has no internal table, so when Perl's parser needs to jump somewhere in the code with a goto, it actually has to stop and search. This can make a loop much more efficient than a goto in Perl.

Tip 3: Use built-in functions

Another general speed trick: Most built-in functions in languages have already been super optimized, so you want to use a language's particular built-in functions when you can. In Perl, pack, unpack, and spritnf are extremely fast features, so use them when possible. Pack and unpack are actually implemented in pure C, and so is sprintf. For example, if you are simply extracting information from a string, using pack and unpack is faster than using substr.

Tip 4: Be careful when accessing outside information

Reducing the size of outside files and the number of calls to outside files can help speed. Some methods are better than others for outside calls. Sysread is better for getting information in blocks than getc. Grep and opendir can be used to get directory listings, and are often better than the large lists that would be returned from a glob. Using the system command can slow things down because it creates a sub process, and perhaps a shell as well, which involves more instructions and processor time.

Tip 5: Use your Modules

There is no reason to rewrite code that already exists. There are many CPAN modules that already solve problems with Perl speed. For instance, there are fork and thread modules that are great for working with multiple file handles, especially network sockets.

If you are running a production Web server and you are concerned about speed, you may want to use mod_perl. Most of us have heard rumors that mod_perl clocks at speeds faster than PHP or compiled C for CGI calls (at least when not involved in database calls). The mod_perl module runs the Perl interpreter within the Apache process, and was made to marry Perl and CGI.

In a C-like compiled program, tokenization and parsing happen only once during compilation. In Perl, tokenization and parsing have to happen each time the script runs and the Perl interpreter is opened. This is why the SpeedyCGI/PersistantPerl mod has such a great following. This mod runs Perl scripts persistently, keeping the Perl interpreter locked open so that it isn't constantly called and re-called.

Another helpful module is Memoize, written by Mark-Jason Dominus. Memoize is a mod that focuses on making it easy to write caching functions. Caching can be a godsend in a situation where the same functions are called with the same arguments over and over. You tell Memoize which functions you want it to speed up; Momoize intercepts their calls, and takes a look at their parameters. It they are parameters it has seen before, it sends back the cached response. Otherwise, it calls the function and stores the parameter result, basically trading memory for speed.

Finally, AutoSplit and AutoLoader are modules that tell Perl to load only parts of your other modules, the parts that are actually being used by the specific script. By using these, you can eliminate CPU waste at start up, and you also won't waste RAM holding the entire module structure that Perl generates when it compiles.

Tip 6: Watch memory carefully

Speed and memory go hand in hand when optimizing. Part of Perl's practicality comes from its ability to automatically handle things such as variable allocation and garbage collection, so it can be difficult to make suggestions for optimizing Perl when it comes to memory, and care must be taken because some memory tricks may decrease your RAM imprint but slow down the program.

For variables and data, you can release RAM by sorting arrays and hashes in file memory. You can use pack and unpack to store information in external files fairly efficiently. (Of course, this can cause slow down and conflicts directly with Tip number 4.) You may want to avoid creating large temporary lists in Perl. Creating an array and using it frequently creates a lot of temporary storage space in RAM, so avoid any lists or list operations that you do not absolutely need. Again, you can use temporary files to store large arrays, but be wary of slow down due to accessing outside information.

The vec function can also store numbers in a single variable, rather than storing each small number on its own. The same technique can be applied with strings by using substr to store small fixed length strings in one longer string.

Some Perl statements significantly imply large memory overheads. If the order of your hash isn't important, you can use each to iterate through the hash instead of keys. Using each will reduce the size of the temporary list passed to the loop-control statement.

Finally, you can use undef and delete to remove variables or hash elements after you are done with them, preempting the garbage collector.

A few other tips for increasing the run speed of your Perl code:

When using variables, use my instead of local.

Use references rather than static lists. Use references to all variables, especially lists and hashes.

You can pre-extend an array or a string to save time because the memory is pre-allocated instead of on the fly.

If a string is large and the regular expressions are complex, you can use the study function to improve performance.

Each operation (Ops) in Perl tends to slow the script down. Perl will tell you how many operations it generates if you use the Terse backend to the compiler. Look there for things to slash out.

Regular expressions in Perl can be a drain on resources because Perl goes over them several times to match groups.

Keep your argument lists short.

Finally, keep in mind that optimizing code can take a lot of time and effort, and isn't always worth it. Optimizing may cause other, bigger problems, such as making code harder to maintain to extend, or buggier. Only if the script is running hundreds of times a day, or if the code relies on speed as a requirement, is shaving a few seconds off of it worth the development time.

References

There are a number of interesting benchmarking contests between languages: