"Linux Gazette...making Linux just a little more fun!"

Learning Perl, part 4

The Internet Revolution was founded on open systems; an open system
is one whose software you can look at, a box you can unwrap and play with.
It's not about secret binaries or crippleware or brother-can-you-spare-a-dime
shareware. If everyone always had hidden software, you wouldn't have 1/100th
the useful software you have right now.

And you wouldn't have Perl. -- Tom Christiansen

Overview

If you have been following this series, you now have a few tools - perhaps
you've even experimented with them - which can be used to build scripts.
So, this month we're going to take a look at actually building some, particularly
by using the "open" function which allows us to assign filehandles to files,
sockets, and pipes. "open" is a major building block in using Perl, so
we'll give it a good long look.

Excercises

Last time, I mentioned writing a few scripts for practice. Let's take
a look at a few possible ways to do that.

The first one was a script that would take a number as input, and print
"Hello!" that many times. It would also test the input for illegal (non-numeric)
characters. Here is a good example, sent in by David Zhuwao:

First, to point out good coding practices: David has used the "-w"
switch so that Perl will warn him if there are any compile-time warnings
- an excellent habit. He has also used whitespace (blank lines and tabs)
effectively to make the code easy to read, as well as commenting it liberally.
Also, rather than checking for the presence of a number (which would create
a problem with input like "1A"), he is testing for non-numerical characters
and a length greater than zero - good thinking!

Minor points (note that none of these are problems as such, simply
observations): in using the match operator, "m//", the "m" is unnecessary unless
the delimiter is something other than "/". As well, the Perl "for/foreach"
loop would be more compact than the C-like "for" loop, while still fulfilling
the function:

print "Hello!\n" for 1 .. $input;

It would also render "$i" unnnecessary. Other than those minor nits
- well done, David!

Here's another way:

#!/usr/bin/perl -w

print "Please enter a number: ";chomp ( $a = <> );

print "Hello!\n" x $a if $a =~ /^\d+$/;

Unlike David's version, mine does not print a failure message; it simply
returns you to the command prompt if the input is not numeric. Also, instead
of testing for non-numerical characters, I'm testing the string from its
beginning to its end for only numerical content. Either of these
techniques will work fine. Also, instead of using an explicit loop, I'm
using Perl's "x" operator, which will simply repeat the preceding print
instruction "$a" times.

...And, One More Time...

Let's break down another one, the second suggestion from last month:
a script that takes an hour (0-23) as input and says "Good morning", "Dobriy
den'", "Guten Abend", or "Buenas noches" as a result (I'll cheat here and
use all English to avoid confusion.)

On the surface, this script seems pretty basic - and, really, it is
- but it contains a few hidden considerations that I'd like to mention.
First, why do we need the "beginning of line" and "end of line" tests for
everything? Obviously, we want to avoid confusing "1" and "12" - but what
could go wrong with /1[3-8]/?

What could go wrong is a mis-type. Not that it matters too much in this
case, but being paranoid about your tests is a good idea in general. :)
What happens if a user, while trying to type "14", typed "114"? Without
those "limits", it would match "11" - and we'd get a wrong answer.

OK - why didn't I use numeric tests instead of matching? I mean, after
all, we're just dealing with numbers... wouldn't it be easier and more
obvious? Yes, but. What happens if we do a numeric test and the
user types in "joe"? We'd get an error along with our "Invalid input!":

Argument "joe\n" isn't numeric in gt at -e line 5, <> chunk 1.

As a matter of good coding practice, we want the user to see only the
output that we generate (or expect); there should not be any errors caused
by the program itself. A regex match isn't going to be "surprised" by non-digit
input; it will simply return a 0 (no match) and pass on to the next "elsif"
or "else", which is the "catchall" clause. Anything that does not match
one of the first four tests is invalid input - and that's what we want
reported.

Handling Files

An important capability in any language is that of dealing with files.
In Perl, this is relatively easy, but there are a couple of places where
you need to be careful.

# The right wayopen FILE, "/etc/passwd" or die "Can't open /etc/password: $!\n";

Here are some wrong or questionable ways to do this:

# Doesn't test for the return resultopen FILE, "/etc/passwd";

# Ignores the error returned by the shell via the '$!' variableopen FILE, "/etc/passwd" or die "Can't open /etc/password\n";

# Uses "logical or" to test - can be a problem due to precedence
issuesopen FILE, "/etc/passwd" || die "Can't open /etc/password: $!\n";

By default, files are open for reading. Other methods are specified
by adding a rather obvious "modifier" to the specified filename:

# Open for writing - anything written will overwrite file contentsopen FILE, ">/etc/passwd" or die "Can't open /etc/password: $!\n";

# Open for appending - data will be added to the end of the fileopen FILE, ">>/etc/passwd" or die "Can't open /etc/password: $!\n";

# Open for reading and writingopen FILE, "+>/etc/passwd" or die "Can't open /etc/password: $!\n";

# Open for reading and appendingopen FILE, "+>>/etc/passwd" or die "Can't open /etc/password: $!\n";

Having created the filehandle ("FILE", in the above case), you can now
use it in the following manner:

while ( <FILE> ) { print; # This
will loop through the file and print every line}

Or you can do it this way, if you just want to print out the
contents in one shot:

print ;

Writing to the file is just as easy:

print FILE "This line will be written to the file.\n";

Remember that the default open method is "read". I usually like to emphasize
this by writing the statement this way:

open FILE, "</etc/passwd" or die "Can't open /etc/password: $!\n";

Note the "<" sign in front of the filename: Perl has no problem with
this, and it makes a good visual reminder. The phrase "leaving breadcrumbs"
describes this methodology, and has to do with the idea of making what
you write as obvious as possible to anyone who may follow. Don't forget
that the person "following" might be you, a couple of years after you've
written the code...

Perl automatically closes filehandles when the script exits... or, at
least, is supposed to. From what I've been told, some OSs have a problem
with this - so, it's not a bad idea (though not a necessity) to perform
an explicit "close" operation on open filehandles:

close FILE or die "Can't close FILE: $!\n";

By the way, the effect of the "die" function should be relatively obvious:
it prints the specified string and exits the program.

Don't do this, unless you're at the last line of your script:

close;

This closes all filehandles... including STDIN, STDOUT, and STDERR
(the standard streams), which leaves your program dumb, deaf, and blind.
Also, you cannot specify multiple handles in one close, so you do indeed
have to close them one at a time:

close Fh1 or die "Can't close Fh1: $!\n";close Fh2 or die "Can't close Fh2: $!\n";close Fh3 or die "Can't close Fh3: $!\n";close Fh4 or die "Can't close Fh4: $!\n";

Let's say that you have two files with some financial data - loan rates
in one, the type and amount of your loans in the other - and you want to
calculate how much interest you'll be paying, and write the result out
to a file. Here is the data:

while ( <Loans> ) { # Split the line into an array @loans = split; # Print the loan and the amount of interest
to the "Total" handle; # calculate by multiplying the total amount
by the value returned # by the hash key. print Total "$loans[0]\t\t\$", $loans[2] * $r{lc
$loans[1]}, "\n";}

# Close the filehandles - not a necessity, but can't hurtfor ( qw/Rates Loans Total/ ) { close $_ or die "Can't close $_: $!\n";}

Rather obviously, Perl is very good at this kind of thing: we've done
the job in a dozen lines of code. The comments took up most of the space.
:)

Here's another example, one that came about as a result of one of my
article about procmail ("No
More Spam!" in LG#62). The original "blacklist" script that was invoked
from Mutt pulled out the spammer's e-mail address via "formail", then parsed
the result down to the actual "user@host" address with a one-line Perl
script. It took the entire spam mail as piped input. Martin Bock, however,
suggested doing the whole thing with Perl; after exchanging a bit of e-mail
with him, I came up with the following script based on his idea:

#!/usr/bin/perl -wln# The '-n' switch makes the script read the input one line at a
time--# the entire script is executed for each line;# the '-l' enables line processing, which appends carriage returns
to# the lines that are printed out.

# If the line matches the expression, then...if ( s/^From: .*?(\w\S+@\S+\w).*/$1/ ) { # Open the "blacklist" with the "OUT" filehandle
in append mode open OUT, ">>$ENV{HOME}/.mutt/blacklist" or
die "Aargh: $!\n"; # Print $_ to that filehandle print OUT; # Close close OUT or die "Aargh: $!\n"; # Exit the loop last;}

The substitution operator in the first line is not perfect - I can
write some rather twisted e-mail addresses which it would not parse correctly
- but it works well with variations like

To "decode" what the regular expression in it says, consult the "perlre"
manpage. It's not that complex.
Hint: look for the word
"greed" to understand that ".*?", and look for the word "capture" to understand
the "(...) / $1" construct. Both of them are very important concepts, and
both have been mentioned in this series.

Here's a somewhat more compact (and that much less readable) version
of the above; note that the mechanism here is somewhat different:

The BEGIN block on the first line of the script runs only once during
execution, despite the fact that the script loops multiple times; it's
very similar to the same construct in Awk.

Next Time

Next month, we'll be looking at a few nifty ways to save ourselves work
by using modules: useful code that other people have written from
the Comprehensive Perl Archive Network (CPAN).
We'll also take a look at how Perl can be used to implement CGI, the Common
Gateway Interface - the mechanisms that "hew the wood and draw the water"
behind the scenes of the Web. Until then, here are a few things to play
with:

Write a script that opens "/etc/services" and counts how many ports
are listed as supporting UDP operation, and how many support TCP. Write
the service names into files called "udp.txt" and "tcp.txt", and print
the totals to the screen.

Open two files and exchange their contents.

Read "/var/log/messages" and print out any line that contains the word
"fail", "terminated/terminating", or " no " in it. Make it
case-insensitive.

Ben Okopnik

A cyberjack-of-all-trades, Ben wanders the world in his 38' sailboat,
building networks and hacking on hardware and software whenever he runs out of
cruising money. He's been playing and working with computers since the Elder
Days (anybody remember the Elf II?), and isn't about to stop any time soon.