Hidden Treasures of the Perl Core, part II

By Casey West on June 19, 2003 12:00 AM

In the previous hidden
treasures article, we looked at some easy-to-use (but not well-known) modules in
the Perl Core. In this article, we dig deeper to uncover some of the truly precious
and unique gems in the Perl Core.

Wow, that's a lot of work! I've already given up on my program, not to
mention the syntax error in the declaration of TUESDAY. Now let's
try this again using the multiple declaration syntax, new to the
constant pragma for Perl 5.8.0.

The only warning here is that this syntax is new to Perl 5.8.0. If you
intend to distribute a program using multiple constant declarations,
then remember the limitations of the program. You may want to specify what
version of Perl is required for your program to work.

use 5.8.0;

Perl will throw a fatal error if the version is anything less than
5.8.0.

This module allows us to play with Perl's subroutine attribute syntax
by defining our attributes. This is a powerful module with a
rich feature set. Here I'll give you an example of writing a minimal
debugger using subroutine attributes.

First, we need to create an attribute. An attribute is any subroutine
that has an attribute of :ATTR. Setting up our debug attribute
is easy.

use Attribute::Handlers;

sub debug :ATTR {
my (@args) = @_;
warn "DEBUG: @args\n";
}

Now we have a simple debug attribute named :debug. Using
our attribute is also easy.

sub table :debug {
# ...
}
table(%data);
table(%other_data);

Now, since attributes are compiled just before runtime, in the CHECK
phase, our debugging output will only be sent to STDERR once. For the
code above, we might get output like this:

That debug string represents some of the information we get in an attribute
subroutine. The first argument is the name of the package the attribute
was declared in. Next is a reference to the symbol table entry for the
subroutine, followed by a reference to the subroutine itself. Next comes
the name of the attribute, followed by any data associated with the attribute
(none in this case). Finally, the name of the phase that invoked the handler
passed.

At this point, our debugging attribute isn't useful, but the parameters
we are given to work with are promising. We can use them to invoke
debugging output each time the subroutine is called. Put on your hard hat,
this is where things get interesting.

First, let us take a look at how we want to debug our subroutine. I think we'd
like different levels of debugging output. At the lowest level (1), the name
of the subroutine being invoked should be sent to STDERR. At the next
level (2), it would be nice to be notified of entry and exit of the subroutine. Going further (level 3), we might want to see the arguments passed to the
subroutine. Even more detail can be done, but we'll save that for later and
stop at three debug levels.

In order to do this voodoo, we need to replace our subroutine with one doing
the debugging for us. The subroutine doing the debugging must then invoke
our original code with the parameters passed to it, and return the proper output
from it. Here is the implementation for debug level one (1).

There are some sticky bits in the debug subroutine that I need to explain
in more detail.

my $name = join '::', *{$symbol}{PACKAGE}, *{$symbol}{NAME};

This line is used to find the name and package of the subroutine we're debugging.
We do the lookups from the symbol table, using the reference to the symbol that
our attribute is given.

no warnings 'redefine';

Here we turn off warnings about redefining a subroutine, because we're going
to redefine a subroutine on purpose.

*{$symbol} = sub { ... };

This construct simply replaces the code section in the symbol table with this
anonymous subroutine (which is a code reference).

In this example, we set the default log level to one (1), set up some helper variables, and replace our table() subroutine with a debugging closure. I
call the anonymous subroutine a closure because we are reusing some variables
that are defined in the debug() subroutine. Closures are explained in greater
detail in perlref (perldoc perlref from the command line).

To set the debug level for a subroutine, just a number the :debug attribute.

In this example, we use sprintf to make out debugging statements a little
more readable as complexity grows. This time, we cannot return directly from
the original code reference. Instead, we have to capture the output and return
it at the end of the routine. When the table() subroutine defines its debug
level as :debug(2) the output is thus.

Attribute::Handlers can do quite a lot more than what I've shown you already.
If you like what you see, then you may want to add attributes to variables or worse.
Please read the thorough documentation provided with the module.

This module is a well-known Perl debugging module. It generates Perl source
code from Perl source code provided to it. This may seem useless to some,
but to the aspiring obfuscator, it's useful in understanding odd code.

perl -snle'$w=($b="bottles of beer")." on the wall";$i>=0?print:last
LINE for(map "$i $_",$w,$b),"take one down, pass it around",
do{$i--;"$i $w!"}' -- -i=100

That is an example of an obfuscated program. It could be worse, but it's
pretty bad already. Understanding this gem is as simple as adding
-MO=Deparse to the command line. This will use B::Deparse to turn that mess into more readable Perl source code.

To use B::Deparse in the everyday example, just run your program using
it on the command line.

perl -MO=Deparse prog.pl

But if you want to have some real fun, then dig into the object-oriented interface
for B::Deparse. There you will find an amazing method called coderef2text(). This method turns any code reference to text, just like the command line trick does for an entire program. Here is a short example.

There are more methods in the B::Deparse class that you can use to muck
around with the results of coderef2text(). This module is powerful
and useful for debugging. I suggest you at least use the simple version if
code becomes ambiguous and incomprehensible.

While B::Deparse is good at what it does, it's not complete. Each
version of Perl has made it better, and it's good in Perl 5.8.0. Don't
trust B::Deparse to get everything right, though. For instance, I
wouldn't trust it to serialize code for later use.

This module, just like the constant pragma, is well-known. The
difference is that Class::Struct is not often used. For many programs,
setting up a class to represent data would be ideal, but overkill.
Class::Struct gives us the opportunity to live in our ideal world
without the pain of setting up any classes by hand. Here
is an example of creating a class with Class::Struct. In this
example, we're going to use compile time-class declarations, a new feature
in Perl 5.8.0.

Here we've created a class called Person with three attributes.
name can contain a simple scalar value, represented by the dollar
sign ($). mom and dad are both objects of type Person.
Using our class within the same program is the same as using any
other class.

Class::Struct classes are simple by design, and can get more complex
with further creativity. For instance, to add a method to the Person
class you can simply declare it in the Person package. Here is a
method named birth() which should be called on a Person object. It
takes the name of the baby as an argument, and optionally the father
(a Person object). Returned is a new Person object representing
the baby.

Encode is Perl's interface to Unicode. An explanation of Unicode itself
is far beyond the scope of this article. In fact, it's far beyond the scope
of most of us. This module is powerful. I'm going to provide some examples
and lots of pointers to the appropriate documentation.

The first function of the API to learn is encode(). encode() will convert a string for Perl's internal format to a series of octets in the
encoding you choose. Here is an example.

use Encode;
my $octets = encode( "utf8", "Hello, world!" );

Here we have turned the string Hello, world! into a utf8 string, which is now in $octets. We can also decode strings using the decode() function.

my $string = decode( "utf8", $utf8_string );

Now we've decoded a utf8 string into Perl's internal string representation.
Since utf8 is a common encoding to deal with, there are two helper functions:
encode_utf8(), and decode_utf8. Both of these function take a string
as the argument.

A list of supported encodings can be found in Encode::Supported, or by
using the encodings() method.

my @encodings = Encode->encodings;

For even more Unicode fun, dive into the documentation in Encode
(perldoc Encode on the command line).

This module gives us an easy way to write source-code filters. These
filters may change the behavior of calling Perl code, or implement new
features of Perl, or do anything else they want. Some of the more
infamous source-filter modules on the CPAN include Acme::Bleach,
Semi::Semicolons, and even Switch.

In this article, I'm going to implement a new comment syntax for Perl.
Using the following source-filter package will allow you to comment
your code using SQL comments. SQL comments begin with two consecutive
dashes (--). For our purposes, these dashes cannot be directly
followed by a semicolon (;) or be preceded by something other than
whitespace or a the beginning of a line.

In this example, we create an anonymous subroutine that is passed on
to Filter::Simple. The entire source of the calling program is
in $_, and we use a regular expression to search for our SQL comments
and change them to Perl comments.

Using B::Deparse on the command line, we can see what the code
looks like after it's filtered. Just remember that B::Deparse
doesn't preserve comments.

use SQLComments;
my $i = 100;
while ($i) {
--$i;
}

The output is exactly as we expect. Filtering source code is a complex art.
If your filters are not perfect, then you can break code in unexpected ways.
Our SQLComments filter will break the following code.

print "This is nice -- I mean really nice!\n";

It will turn into this.

print "This is nice# I mean really nice!\n";

Not exactly the results we want. This particular problem can be avoided,
however, using Filter::Simple in a slightly different way. You can
specify filters for different sections of the source code, here is how
we can limit our SQLComments filter to just code and not quote-like
constructs.

There are some functions that are repeated in hundreds (probably thousands)
of programs. Think of all the sorting functions written in C programs. Perl
programs have them, too, and the following utility modules try to clean up our
code, eliminating duplication is simple routines.

There are a number of useful functions in each of these modules. I'm going to
highlight a few, but be sure to read the documentation provided with each
of them for a full list.

blessed() will return the package name that the variable is blessed into,
or undef if the variable isn't blessed.

my $baby = Person->new;
my $class = blessed $baby;

$class will hold the string Person. weaken is a function that
takes a reference and makes it weak. This means that the variable will
not hold a reference count on the thing it references. This is useful
for objects, where you want to keep a copy but you don't want to stop
the object from being DESTROY-ed at the right time.

Hash::Util has a slightly different function than the previously
discussed variable utility modules. This module implements restricted
hashes, which are the predecessor to the undesirable (and now obsolete)
pseudo-hashes.

lock_keys() is a function that will restrict the allowed keys of a
hash. If a list of keys is given, the hash will be restricted to that
set, otherwise the hash is locked down to the currently existing keys.

The %person hash is now restricted. Any keys currently in the
hash may be modified, but no keys may be added. The following code
will result in a fatal error.

$person{wife} = $wife;

You can use the unlock_keys() function to release your restricted
hash.

You can also lock (or unlock) a value in the hash.

lock_value( %person, "name" );
$person{name} = "Bozo"; # Fatal error!

Finally, you can lock and unlock an entire hash, making it read only
in the first case.

lock_hash( %person );

Now our %person hash is really restricted. No keys can be added or
deleted, and no values can be changed. I know all those OO folks out there
wishing Perl made it easy to keep class and instance data private are
smiling.

You can specify any type of code, but if it's not the default two character
representation you must supply the extra argument to define what type it is.

my $name = code2country( "120", LOCALE_CODE_NUMERIC ); # Cameroon

Just as before, you can get a full list of codes and countries using the
two query functions: all_country_codes(), and all_country_names().
Both of these functions accept an optional argument specifying the code
set to use for the resulting list.

Memoize is a module that performs code optimization for you. In a
general sense, when you memoize a function, it is replaced by a memoized
version of the same function. OK, that was too general. More specifically,
every time your memoized function is called, the calling arguments are
cached and anything the function returns is cached as well. If the function
is called with a set of arguments that has been seen before, then the cached
return value is sent back and the actual function is never called. This makes
the function faster.

Not all functions can be memoized. For instance, if your function would return
a different value on two calls, even for the exact same set of calling arguments,
then it will be broken. Only the first sets return values will be returned for every
call. Many function do not act this way, and that's what makes Memoize so
useful.

Here is an example of a memoizeable function.

sub add {
my ($x, $y) = @_;
return $x + $y;
}

For every time this function is called as add( 2, 2 ), the result will be 4.
Rather than compute the value of 4 in every case, we can cache it away the first
time and retrieve it from the cache every other time we need to compute 2 + 2.

We've just made add() faster, without any work. Of course, our addition function isn't slow to begin with. The documentation of Memoize gives a much more details look into this algorithm. I highly suggest you invest time in learning about Memoize, it can give you wonderful speed increases if you know how and when to use it.

I currently don't have a Microsoft operating system running on any of my networks,
but when perusing the Perl core, I happened upon the Win32 module. I wanted
to bring it up because if I were using a Microsoft OS, then I would find the functions
in his module invaluable. Please, if you are running in that environment, then look
at the documentation for Win32 for dozens of helpful functions (perldoc Win32 on the command line).

Just as before, I've still not covered all of the Perl core. There is much more to
explore and a full list can be found by reading perlmodlib. The benefit of having
these modules in the core is great. Lots of environments require programmers to be
bound to using only code that is distributed with Perl. I hope I've been able to lighten
the load for anyone who has been put in that position (even by choice).