Copyright Notice

This text is copyright by CMP Media, LLC, and is used with
their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in
SysAdmin/PerformanceComputing/UnixReview magazine.
However, the version you are reading here is as the author
originally submitted the article for publication, not after their
editors applied their creativity.

You can save a lot of time by using prewritten modules effectively.
Many modules are included with the Perl distribution, but an enormous
number are available (for free!) in the Comprehensive Perl Archive
Network (called the CPAN). If you're new to the idea of
downloadable modules for Perl, you should browse
http://www.perl.com/CPAN/CPAN.html to get a feel for what's available.

Installing CPAN modules has even been made pretty easy with the
CPAN.pm module (built in to Perl). For example, if I needed
the Foo::Bar module from the CPAN, it's as simple as typing:

The first time you do this, you might have to answer some questions
about the way to fetch things from the net, or where the nearest CPAN
archive is located. Use http://www.perl.com/CPAN/ if you're not
sure. Also, if you're not the system administrator, you'll need to
add PREFIX=/some/path/you/can/write to the makepl_arg
configuration parameter to install the binaries, modules, and
documentation below that PREFIX, rather than the system
directories. See perldoc CPAN for more information.

So, let's take a look at a task that was made tremendously easier
using the CPAN. The other day, I was thinking about the
rec.humor.funny newsgroup, which gets a mere two postings a day of
some relatively funny jokes. However, sometimes I don't always read
that newsgroup every day, and I miss some of the jokes, because they
expire off my news-server before I get to them.

So I decided to write a program that I could run on a regular basis
(like nightly from a cron job) to connect to the NNTP server, fetch
the jokes, and send them to standard output (which will get mailed to
me from a cron job). At first, that sounds like it might be a lot of
work, because talking to an NNTP server would seem to require knowing
about sockets and the NNTP protocol. Not so.

Graham Barr has written a nice module called Net::NNTP that handles
all the greasy stuff behind the scenes to talk to an NNTP server. If
you don't have it installed yet, it's simple to get, because it's in
the CPAN!

Once installed, using the module is pretty easy. First, I'll add the
appropriate use directive to my program:

use Net::NNTP;

Next, I'll define the news server location and the group I want to
read as global scalars with uppercase names, to let me know they're
configuration things:

The return value tells how many articles are in the group, along with
a minimum and maximum article number. We can use that to scan through
all possible article numbers and dump them out. Let's do that with a
foreach loop:

If an article doesn't exist (perhaps a cancellation or a different
expiration date), we skip over to the next article number. The value
of $art here is either undef, or a listref pointing to the full
text of the article. If it's the listref, we'll just dump it out.

print "=== article $artnum ===\n";
print @$art;
}

And that would be a good program that successfully dumps out all the
articles in rec.humor.funny. The format's not very pretty
though... it has all the headers and the silly common .signature on
each posting. It also includes all the administrivia messages. Let's
fix that.

We could probably write some quick regular expressions to modify the
article text, but let's steal some additional resources from the CPAN
again. In this case, it's Mail::Internet, also by Graham Barr,
which understands RFC822 mail, which happens to be the same format as
a news message. Now our sample program starts like this:

But here's where we diverge now. We take the listref returned
in $art, and build a mail message object from it:

my $mail = Mail::Internet->new($art);

Now we can look at this message as a mail message using the methods
defined for them. First, let's skip over the administrivia messages:

next if $mail->head->get("From") =~ /netfunny\.com/;

The expression $mail-head returns a Mail::Header object for
the message, which in turn has a get method to extract a particular
field. If that matches administrivia address domain, then we skip
over the article. Next, we'll dump the same banner as before:

print "=== article $artnum ===\n";

But now that we have a mail message object, we can do some massage.
Let's remove the signature, and clean up any extra whitespace:

$mail->remove_sig;
$mail->tidy_body;

And print just those headers that we're interested in (the subject,
date, and original submitter):

There. Nicer, without a lot of hassle. So, now that we have a
complete dump of all rec.humor.funny articles currently on the news
server, cleaned up in a nice way, what next?

Well, we still have a problem here. It's dumping all articles
every time. And when's the last time you wanted to hear the same
joke twice in two days (or as many days as it stays on your system)?

We need some memory to let us know what we've dumped. Fortunately,
there's a standard memory, called a newsrc file in a common format
that most newsreaders understand. And, since it's a common format,
there's (once again) a module in the CPAN that can deal with it. In
this case, it's the News::Newsrc module, by Steven McDougall.

Again, nearly the same header, but we've added the use line for the
newest module in the list. Next, we'll fetch the current newsrc file
by creating a newsrc object:

my $newsrc = News::Newsrc->new;
$newsrc->load;

Which brings up a point... we're using the same newsrc file here that
my newsreader also uses, which means that this program will know which
rec.humor.funny articles I've already read, either via this
program, or via my normal newsreader! Nice.

But now we no longer want to cycle through all the articles from $low
to $high. We want to hit only the articles that we've not seen. This
is called unmarked in newsrc jargon, so we'll use the appropriately
named method:

my @unmarked =
$newsrc->unmarked_articles
($GROUP, $low, $high);

Now @unmarked is a list of article numbers that are potentially on the
newsserver (unless they've been cancelled) that are not already
seen by me. Let's cycle through them:

for my $artnum (@unmarked) {
my $art = $c->article($artnum) or next;

If the article has been fetched, we'll mark it. That way, I'll see it
only once:

Finally, we need to update the newsrc file to reflect the additionally
read articles. Again, all the hard work is done... we just need to
invoke a method to do the right thing behind the scenes.

$newsrc->save;

So a fairly short program is now a tiny newsreader, updating the
newsrc file, and even rejecting unwanted administrivia articles. And
it took me under a half hour to write and debug. This is, indeed, the
power of the CPAN. Enjoy!