Benchmarking basics

In a previous tip we used the Benchmark module to compare different
methods of searching for an item in a list. In this tip we'll provide a
basic introduction to Benchmark.pm. To learn more about the module read
its documentation and look at the resources at the end of this tip.

A warning about premature optimisation

Perl is fast. Most of the time the code you write will usually be fast
enough. Further, it is possible to waste more time benchmarking a few
approaches to a problem than you can ever save in the life of that program.

As a result, it is usually best to just write the code rather than worrying
about optimisations. In fact most of the time you can leave thinking about
optimisations until you've found that speed is an issue. Once it is an
issue though, make sure you use Perl's profiling tools to make sure you're
optimising the right code before you start benchmarking. We provide some
information on this in our Profiling tip:
http://perltraining.com.au/tips/2005-07-04.html

If you have a lot of time to waste however, Benchmark.pm is a great way
to prove to your colleagues that your approach is faster than theirs.

What is benchmarking?

Benchmarking allows us to measure code execution time. If we measure two
or more ways of solving the same problem then we can compare the execution
times to determine which one is the faster. The fastest code is also the
most efficient.

Why is benchmarking important?

Benchmarking gives us information on how well our code will scale. For
example searching for an element in a small list is pretty fast, no matter
how we do it. However if we repeat that search hundreds or thousands of
times, then we may discover small differences between the execution times
of each approach. These small differences combine to be large differences
if there are enough executions.

Benchmarking in Perl

The Benchmark.pm module comes standard with Perl and provides a great
interface for a number of comparison options. This tip does not cover all
that Benchmark can do.

Benchmark provides two kinds of subroutines. You can use timethis,
timeit and countit to examine the speed of a single code
snippet. Alternately you can use timethese and cmpthese to run
a number of code snippets. cmpthese prints a nice comparison
chart at the end.

An example

This example compares using grep and foreach to find a random item in
a list. If the first argument to cmpthese is zero then Benchmark will
run each test for at least 3 CPU seconds each, this is a good default. We
can alternately specify the number of times we want it to run with a
positive integer, or a different number of CPU seconds with a negative
integer. The second argument in our example is a hash reference of label =>
subroutine reference pairs.

These results can be interpreted as follows. The grep code snippet
can be run 374 times a second, while the foreach loop ran 641 times per
second. foreach was 72% faster than grep for this kind of data.

If this seems strange, look at the code - you'll see that we're cheating.
grep has to search the whole of @list for each call, because it
returns a count of all the occurrences for which the search block is
true. Our foreach loop on the other hand, stops after the item is
found; which on average means only searching through half of the loop.

If we change our foreach loop to be less efficient by removing the
call to last we get the following results:

Rate foreach grep
foreach 305/s -- -19%
grep 378/s 24% --

which looks more fair.

Using strings instead of code references

Calling a subroutine in Perl is quite costly when compared to more basic
operations such as addition and subtraction. This cost can dwarf the time
required for tests which don't do much, thus Benchmark allows us to pass in
strings for our code snippets. In the below code we change our subroutine
references to single quoted strings using q{}.

# WARNING: this snippet is buggy
use strict;
use Benchmark qw(cmpthese);

Woah! Does this mean that subroutine references were slowing our code down
so dramatically? Actually no. What's happened here is quite subtle. When
we declared @list with my we made it a lexical
variable. Due to the way Benchmark evaluates the code contained in
strings our lexical variables are unaccessible to these snippets, instead
Benchmark assumes the code snippets refer to package variables such as
@main::list instead. Unfortunately we don't have a @main::list so
the code snippets search an empty list many times per second. Declaring
@list with our instead solves the problem:

Interpretation

In all of these examples, we've found that a short-circuiting foreach
statement is faster than grep. Is this meaningful? Can we say that
foreach is definitely faster than grep? The answer is no.
The tests above do not consider the behaviour where the element is not in
the list. We also don't test other comparison operators, would there
be a difference if we were comparing with a regular expression rather than
eq?

Generalising from the results of a single set of benchmark comparisons is
not always possible. The only thing we can assert from these results is
that when searching for existing numbers and short strings in a medium
length list using eq; foreach is faster. Thus, when you're working
on benchmarks for your own code, make sure you use appropriate data and
comparisons for your domain.

Hints

Check that your code is working the way you expect, particularly if you're
providing code snippets in strings. use strict will not complain about
attempts to use package variables that are undefined. Running cmpthese
with 1 for its first argument will help you test this faster.

If the numbers Benchmark returns seem too big, something's wrong. Ask
yourself whether you think it's likely that Perl really can run foreach
over such a big list 821,975 times every second.

If you need to open files, create big lists or hashes, initialise variables
etc, do those before you start the comparison. This will simplify your
test and make sure that the pre-work doesn't get counted. Of course, don't
do this if they are part of your comparison.

Start simple. Identify the issue you're trying to benchmark and remove as
much extraneous code as possible. Add more code to each comparison as you
need to once you're comfortable with the results you're getting.

Differences of less than 5% can usually be ignored. Such differences can
change between different runs of the benchmark code as well as between
different machine set ups.