Sorting with expensive comparisons

Last week we investigated how we can use Perl's sort function to sort lists
using different methods of comparison. This week we'll see two special
techniques to make sorting faster when our comparisons are expensive.

Let's pretend that we have a list of files, and we wish to sort them based upon
the number of lines containing the string "Perl". Performing such a count is
easy:

You'll notice this code uses 'local $_' to say that changes to $_ (such as those
made by the 'while (<FH>)' construct) will be localised to this subroutine. This is considered good manners -- anyone calling our code probably doesn't want $_ to mysteriously change.

Now we can sort our files based upon the number of occurrences of the string
"Perl":

my @sorted = sort { count_perls($a) <=> count_perls($b) } @filenames;

This works perfectly well, but it's doing a lot of needless work. The
count_perl subroutine is called for each filename every time we do a
comparison, and since each filename will be compared to a number of other
filenames during the sort, we'll needlessly search and re-search each file in
our list.

One solution to this is to do the counts once before we sort, and place those
results into a cache. We can then simply sort on the cached values.

By using just a little bit of memory for our cache, we've saved ourselves a lot
of time in needless file processing.

Alternately, we can massage our list into a list of pairs: the filename and a precomputed count. This can be faster than creating a cache, but the cache might be easier to understand. We can do this by using map:

Both the Schwartzian Transform and the hash caching methods provide significant improvements over the naive approach. The speed difference between the two methods is negligible so we recommend you choose whichever you feel most comfortable with. Pre-computing your sort keys is useful whenever you're sorting using an attribute that may take a long time to compute, such as filesystem attributes or database lookups.

This is the last perl-tip for 2004 as Perl Training Australia's staff break for a well-earned rest. Perl-tips will return in 2005.

NEXT YEAR: Why use objects? A brief introduction to Object Oriented programming, Perl-style.