This chapter, more than any other in this book, is about Laziness,
Impatience, and Hubris--because this chapter is about good software
design.

We've all fallen into the trap of using cut-and-paste when we should
have chosen to define a higher-level abstraction, if only just a loop or
subroutine.[1]
To be sure, some folks have gone to the opposite extreme of defining
ever-growing mounds of higher-level abstractions when they should have
used cut-and-paste.[2]
Generally, though, most of us need to think about using more abstraction
rather than less.

[1]
This is a form of False Laziness.

[2]
This is a form of False Hubris.

(Caught somewhere in the middle are the people who have a balanced view
of how much abstraction is good, but who jump the gun on writing their own
abstractions when they should be reusing existing code.)[3]

[3]
You guessed it, this is False Impatience. But if you're determined to
reinvent the wheel, at least try to invent a better one.

Whenever you're tempted to do any of these things, you need to sit back
and think about what will do the most good for you and your neighbor
over the long haul. If you're going to pour your creative energies into
a lump of code, why not make the world a better place while you're at
it? (Even if you're only aiming for the program to succeed, you need
to make sure it fits its ecological niche.)

The first step toward ecologically sustainable programming is simply:
don't litter in the park. When you write a chunk of code, think about
giving the code its own namespace, so that your variables and functions
don't clobber anyone else's, or vice versa. A namespace is a bit like
your home, where you're allowed to be as messy as you like, as long
as you keep your external interface to other citizens moderately civil.
In Perl, a namespace is called a package. Packages provide the
fundamental building block upon which the higher-level concepts of
modules and classes are constructed.

Like the notion of "home", the notion of "package" is a bit nebulous.
Packages are independent of files. You can have many packages in a
single file, or a single package that spans several files, just as your
home could be one part of a larger building, if you live in an apartment, or could comprise several
buildings, if your name happens to be Queen Elizabeth. But the usual
size of a home is one building, and the usual size of a package is one
file. Perl has some special help for people who want to put one package
in one file, as long as you're willing to name the file with the same
name as the package and give your file an extension of ".pm",
which is short for "perl module". The module is the unit of
reusability in Perl. Indeed, the way you use a module is with the
use command, which is a compiler directive that controls the
importation of functions and variables from a module. Every example of
use you've seen until now has been an example of module reuse.

Object classes are another concept built on the package concept.
The concept of classes therefore cuts across the concepts of files and
modules. But the typical class is nevertheless implemented with a
module. (If you're starting to get the feeling that much of Perl culture
is governed by mere convention, then you're starting to get the right
feeling, civilly speaking. The trend over the last 20 years or so has
been to design computer languages that enforce a state of paranoia.
You're expected to program every module as if it were in a state of
siege. Certainly there are some feudal cultures where this is
appropriate, but not all cultures are like this. In Perl culture, by
contrast, you're expected to stay out of someone's home because you
weren't invited in, not because there are bars[4]
on the windows.)

Anyway, back to classes. When you use a module that implements a class,
you're benefiting from the direct reuse of the software that implements
that module. But with object classes you can get the additional
benefits of indirect software reuse when the class you're using turns
around and reuses other classes that it gets some characteristics
from. But this is not primarily a book about object-oriented
methodology, and we're not here to convert you into a raving
object-oriented zealot, even if you want to be converted. There are
already plenty of books out there for that.
Perl's philosophy of object-oriented design fits right in with Perl's
philosophy of everything else: use object-oriented design where it makes
sense, and avoid it where it doesn't. Your call.

As we mentioned in the previous chapter, object-oriented programming in
Perl is accomplished through use of references that happen to refer to
thingies that know which class they're associated with. In fact, now
that you know about references, you know almost everything hard about
objects. The rest of it just "lays under the fingers", as a violinist
would say. You will need to practice a little, though.

In this chapter we will discuss creation and use of packages, modules,
and classes. Then we will review some of the essentials of
object-oriented programming, explain how references become objects, and
illustrate how these objects are manipulated as members
of one or more classes. We'll also tell you how to tie ordinary
variables into object classes to turn them into magical variables.

Perl provides a mechanism to protect different sections of code from
inadvertently tampering with each other's variables. In fact, apart
from certain magical variables, there's really no such thing as a global
variable in Perl. Code is always compiled in the current package.
The initial current package is package main, but at any time you can
switch the current package to another one using the package
declaration. The current package determines which symbol table is used
for name lookups (for names that aren't otherwise package-qualified).
The notion of "current package" is both a compile-time and run-time
concept. Most name lookups happen at compile-time, but run-time lookups
happen when symbolic references are dereferenced, and also when new bits
of code are parsed under eval. In particular, eval operations
know which package they were invoked in, and propagate that package
inward as the current package of the evaluated code. (You can always
switch to a different package within the eval string, of course,
since an eval string counts as a block, as does a file loaded in with do,
require, or use.)

The scope of a package declaration is from the declaration itself
through the end of the innermost enclosing block (or until another
package declaration at the same level, which hides the earlier one).
All subsequent identifiers (except those declared with my, or those
qualified with a different package name) will be placed in the symbol
table belonging to the package. Typically, you would put a package
declaration as the first declaration in a file to be included by
require or use. But again, that's by convention. You can put a
package declaration anywhere you can put a statement. You could even
put it at the end of a block, in which case it would have no effect
whatsoever. You can switch into a package in more than one place; it
merely influences which symbol table is used by the compiler for the
rest of that block. (This is how a given package can span more than
one file.)

You can refer to identifiers[5]
in other packages by prefixing ("qualifying") the identifier with the
package name and a double colon: $Package::Variable. If the
package name is null, the main package is assumed. That is,
$::sail is equivalent to $main::sail.[6]
(The old package delimiter was a single quote, which produced things like
$main'sail and $'sail. But a double colon is now the
preferred delimiter, in part because it's more readable to humans, and
in part because it's more readable to emacs macros. It also gives
C++ programmers a warm feeling.)

[5]
By identifiers, we mean the names used as symbol table keys to access
scalar variables, array variables, hash variables, functions, file or
directory handles, and formats. Syntactically speaking, labels are also
identifiers, but they aren't put into a particular symbol table; rather,
they are attached directly to the statements in your program. Labels
may not be package qualified.

[6]
To clear up another bit of potential confusion, in a variable name like
$main::sail, we use the term "identifier" to talk about main and
sail, but not main::sail. We call that a variable name instead,
because an identifier may not contain a colon. The definition of an
identifier is lexical, in that an identifier is a token that matches
the pattern /^[A-Za-z_][A-Za-z_0-9]*$/.

Packages may be nested inside other packages:
$OUTER::INNER::var. This implies nothing about the order of
name lookups, however. There are no fallback symbol tables. All
undeclared symbols are either local to the current package, or must be
fully qualified from the outer package name down. For instance, there
is nowhere within package OUTER that $INNER::var refers
to $OUTER::INNER::var. It would treat package INNER as
a totally separate global package. Similarly, every package declaration
must declare a complete package name. No package name ever assumes any
kind of implied "prefix", even if (seemingly) declared within the scope
of some other package declaration.

Only identifiers (names starting with letters or underscore) are stored
in the current package's symbol table. All other symbols are kept in
package main, including all the magical punctuation-only variables
like $! and $_. In addition, the identifiers STDIN,
STDOUT, STDERR, ARGV, ARGVOUT, ENV,
INC, and SIG are forced to be in package main even when
used for purposes other than their built-in ones. Furthermore, if you
have a package called m, s, y, or tr,
then you can't use the qualified form of an identifier as a filehandle
because it will be interpreted instead as a pattern match, a
substitution, or a translation. Using uppercase package names avoids
this problem.

Assignment of a string to %SIG assumes the signal handler specified is
in the main package, if the name assigned is unqualified. Qualify
the signal handler name if you want to have a signal handler in a
package, or don't use a string at all: assign a typeglob or a function
reference instead:

The symbol table for a package happens to be stored in a hash whose name
is the same as the package name with two colons appended. The
main symbol table's name is thus %main::, or %::
for short, since package main is the default. Likewise, the symbol table
for the nested
package we mentioned earlier is named %OUTER::INNER::. As it
happens, the main symbol table contains all other top-level symbol
tables, including itself, so %OUTER::INNER:: is also
%main::OUTER::INNER::.

When we say that a symbol table "contains" another symbol table, we mean that it contains a reference to the other symbol table. Since
package main is a top-level package, it contains a reference to itself,
with the result that %main:: is the same as
%main::main::, and %main::main::main::, and so on, ad
infinitum. It's important to check for this special case if you write
code to traverse all symbol tables.

The keys in a symbol table hash are the identifiers of the symbols in
the symbol table. The values in a symbol table hash are the
corresponding typeglob values. So when you use the *name typeglob
notation, you're really just accessing a value in the hash that holds
the current package's symbol table. In fact, the following have the
same effect, although the first is potentially more efficient because it does the
symbol table lookup at compile time:

Since all packages are accessible (directly or indirectly) through
package main, you can visit every package variable in the program,
using code written in Perl. The Perl debugger does precisely that when
you ask it to dump all your variables.

Assignment to a typeglob performs an aliasing operation; that is,

*dick = *richard;

causes everything accessible via the identifier richard to also be
accessible via the symbol dick. If you only want to alias a
particular variable or subroutine, assign a reference instead:

*dick = \$richard;

This makes $richard and $dick the same variable, but leaves
@richard and @dick as separate arrays. Tricky, eh?

This mechanism may be used to pass and return cheap references
into or from subroutines if you don't want to copy the whole thing:

On return, the reference will overwrite the hash slot in the
symbol table specified by the *some_hash typeglob. This
is a somewhat sneaky way of passing around references cheaply
when you don't want to have to remember to dereference variables
explicitly. It only works on package variables though, which is why
we had to use local there instead of my.

Another use of symbol tables is for making "constant" scalars:

*PI = \3.14159265358979;

Now you cannot alter $PI, which is probably a good thing, all in all.

When you do that assignment, you're just replacing one reference within
the typeglob. If you think about it sideways, the typeglob itself can
be viewed as a kind of hash, with entries for the different variable
types in it. In this case, the keys are fixed, since a typeglob can
contain exactly one scalar, one array, one hash, and so on. But you can
pull out the individual references, like this:

This is primarily used to get at the internal filehandle reference,
since the other internal references are already accessible in other
ways. But we thought we'd generalize it because it looks kind of
pretty. Sort of. You probably don't need to remember all this unless
you're planning to write a Perl debugger. So let's get back to the
topic of writing good software.

A BEGIN subroutine is executed as soon as possible, that is, the
moment it is completely defined, even before the rest of the containing
file is parsed. You may have multiple BEGIN blocks within a
file--they will execute in order of definition. Because a BEGIN
block executes immediately, it can pull in definitions of subroutines
and such from other files in time to be visible during compilation of the
rest of the file.
This is important because subroutine declarations change how the rest
of the file will be parsed. At the very least, declaring a subroutine
allows it to be used as a list operator, without parentheses. And if
the subroutine is declared with a prototype, then calls to that
subroutine may be parsed like any of several built-in functions
(depending on which prototype is used).

An END subroutine, by contrast, is executed as late as
possible, that is, when the
interpreter is being exited, even if it is exiting as a result of a
die function, or from an internally generated exception such as you'd
get when you try to call an undefined function. (But not if it's is
being blown out of the water by a signal--you have to trap that
yourself (if you can).)[8]
You may have multiple END blocks within a file--they will execute
in reverse order of definition; that is: last in, first out (LIFO).
That is so that related BEGINs and ENDs will nest the way you'd
expect, if you pair them up.

When you use the -n and -p switches to Perl, BEGIN
and END work just as they do in awk (1), as a degenerate case.
For example, the output order of colors if you run the following
program is red, green, and blue:

die "green\n";
END { print "blue\n" }
BEGIN { print "red\n" }

Just as eval provides a way to get compilation behavior during run-time,
so too BEGIN provides a way to get run-time behavior during compilation.
But note that the compiler must execute BEGIN blocks even if you're
just checking syntax with the -c switch. By symmetry, END blocks
are also executed when syntax checking. Your END blocks should not
assume that any or all of your main code ran. (They shouldn't do this
in any
event, since the interpreter might exit early from an exception.) This
is not a bad problem in general. At worst, it means you should test the
"definedness" of a variable before doing anything rash with it. In
particular, before saying something like:

system "rm -rf '$dir'"

you should always check that $dir contains something meaningful, whether
or not you're doing it in an END block. Caveat destructor.

Normally you can't call a subroutine that isn't defined. However, if
there is a subroutine named AUTOLOAD in the undefined subroutine's
package (or in the case of an object method, in the package of any of
the object's base classes), then the AUTOLOAD subroutine is called
with the same arguments as would have been passed to the original
subroutine. The fully qualified name of the original subroutine
magically appears in the package-global $AUTOLOAD variable, in the
same package as the AUTOLOAD routine.

Most AUTOLOAD routines will load a definition for the undefined
subroutine in question using eval or require, then execute that
subroutine using a special form of goto that erases the stack frame
of the AUTOLOAD routine without a trace.

The standard AutoSplit module is a tool used by module writers to
help split their modules into separate files (with filenames ending
in .al), each holding one routine. The files are placed in
the auto/ directory of the Perl library. These files can then be loaded
on demand by the standard AutoLoader module. A similar approach is
taken by the SelfLoader module, except that it autoloads functions from
the file's own DATA area (which is less efficient in some ways and
more efficient in others). Autoloading of Perl functions is analogous
to dynamic loading of compiled C functions, except that autoloading (as
practiced by AutoLoader and SelfLoader) is done at the granularity of
the function call, whereas dynamic loading (as practiced by the
DynaLoader module) is done at the granularity of the complete module,
and will usually link in many C or C++ functions all at once. (See also
the AutoLoader, SelfLoader, and DynaLoader modules in Chapter 7, The Standard Perl Library.)

But an AUTOLOAD routine can also just emulate the routine and never
define it. For example, let's pretend that any function that isn't defined
should just call system with its arguments. All you'd do is this: