Copyright Notice

This text is copyright by CMP Media, LLC, and is used with
their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in
SysAdmin/PerformanceComputing/UnixReview magazine.
However, the version you are reading here is as the author
originally submitted the article for publication, not after their
editors applied their creativity.

I spend a fair amount of time (some would say ``too much time'') hanging
out in mailing lists and chat areas for Perl beginners. One of the
problems that comes up frequently is what someone should do when their
program gets too large, or they want to share code between programs.
The question will usually be phrased as ``how do I include a file?'',
because the presumption is that part of the code can simply be
transported into a new file, and glued back in at the proper time.
But beneath this question lurks a number of troubles that a beginning
Perl programmer might not realize. Let's look at why the naive
approach can sometimes be trouble down the road.

The require function (on which the use statement is built) is
rather simple. Given a filename (or a package name that can be turned
in to a filename), locate the file along the @INC path, and bring
it in. If the file is successfully loaded, program execution
continues, and a notation is made in %INC to prevent the file from
being loaded twice.

Let's say that we have a number of programs that all want to calculate
a running total, looking for the sum at the end. We could create
calculate_running_total.pl, and put it in our @INC:

my $total = 0;

sub add_item {
$total += shift;
}

sub grand_total {
return $total;
}

1;

The 1; at the end is mandated by the require interface: the last
expression evaluated in the file has to be a true value, or the
require fails.

We can pull it into our code like so:

require 'calculate_running_total.pl';

for my $file (glob '*') {
add_item(-s $file);
}

print "total bytes is ", grand_total(), "\n";

Here, I'm walking through a list of filenames to get their sizes,
and then adding each of the sizes into the hidden total. When I'm done,
I'll grab the grand total and display it.

The $total variable here is hidden within the file. There's
nothing I can say in the main script to access that variable directly.
We'll come back to that in a moment.

One of the problems with including a file like this is that the
namespace is shared between the main program and the included file.
If calculate_running_total.pl needed a few more subroutines to
perform the task, the subroutines in my main program might collide,
especially if they were undocumented. I could just get creative with
my names:

A little nicer, and a little clearer. Of course, nothing stops us
from still calling CalculateRunningTotal::normalize except perhaps
a gentleman's agreement. OK, call it ``good programming practice''.

The names are still a bit long, and by migrating this thing into a
full-blown module, we can shorten that up even more. First, we'll
bring in the Exporter module to handle the namespace aliasing, and
we'll also have to change the name to end in .pm so that use
knows how to find it. So in CalculateRunningTotal.pm, we now have:

At this point, we have the basic workings of a trivial but clean
module. Time to get back to the original point. What is the mistake
people often make when designing the interface? It's when people
choose to export the data of the module in addition to (or instead
of) the behavior.

For example I might have looked at the early version of this module,
staring at the code:

sub grand_total {
return $total;
}

I might have asked myself why I was writing such a boring trivial subroutine,
when I could just give access to $total directly. Since the Exporter
works only with package variables, I'd have to change this from a lexical
to a package variable (declared with our):

Here, the $total variable has been exported and aliased from the
CalculateRunningTotal package into the main package. (Let's
ignore for a moment that even add_item would then be nearly
useless.)

The problem with this configuration is that I'm now commiting to
providing a specific variable with the name $total that can be
consulted at any time to provide the running total. Before, with
grand_total in place, I had some place to add extra hooks if I
desired later, such as the normalize routine in some of the
previous versions. But where would I put normalize now?

The data interface is very flexible, and very hard to upgrade or
update. Sure, I could probably replace $total with a tied
variable, but only at some speed expense and increased difficulty of
debugging.

Also, consider that I've now increased the scope of $total to
include many more lines of code. If in my debugging, I see that the
value of $total is now incorrect, I'll have to look over a much
broader range of code to see what might be altering it. Some would
call this having increased the coupling between the main code and
the external package, and increased coupling almost always comes with
a debugging cost.

So, when you're designing a module, you should definitely think twice
(or three times) when you start to add variables to your @EXPORT
list. It may seem to relieve a problem initially, but it will almost
always lead to bad things later. Until next time, enjoy!