July 2001 Archives

Larry's State of the Onion presentation this year was, as every year,
completely different from previous years. As he noted, previous talks
have been about chemistry, theology, biology, and music; this year,
for once, Larry actually talked about Perl.

And this year the format was rather different. Based on the success
of the lightning talks at previous Perl conferences, Larry decided to
adopt this for his presentation - he gave us thirty-three lightning
talks of fifty-five seconds each, with his daughter Heidi ruthlessly
manning the bell. So ruthless, in fact, that Larry had to encourage
us to laugh quickly to avoid cutting into his talking time.

Perl 6 and Apocalypses

Larry Wall during his State of the Onion speech at TPC 5. Larry's time keeper for the lightening rounds was Heidi Alayne Wall. For more on TPC 5 and the O'Reilly Open Source Convention, visit O'Reilly Network's conference coverage page.

Photo courtesty D. Story/J. Blanchard/O'Reilly Network

Larry talked about both Perl 5 and Perl 6. He noted that many talented
people were putting dedication and love into Perl 5, and Perl 5 is
doing great. So much for Perl 5.

Perl 6 was obviously the major focus of Larry's talk, with each lightning
talk laying out the major points of an Apocalypse. As before,
Apocalypses mirrored chapters of the Camel book, Programming Perl.

Larry hinted that he'd already recanted part of the second Apocalypse;
the third Apocalypse was due to come out, but Larry got sick. On the
other hand, this gave him a chance to go slowly, and to do it right.

So his lightning talks really began with apocalypse three, about
operators. Larry noted that there had been a number of very specific
proposals, but he wanted to concentrate on generalities. He also
said that he was wavering on the idea of user-defined operators, and
particularly Unicode operators. -> will become . - start accepting
that now. This will mean that the concatenation operator will have to
change, and this will probably become ~.

The bell tolled, and so Larry had to move onto talk four - control
structures. To loud applause, he announced that Perl 6 would include a
switch statement; to some bemusement, however, he let on that it would
be called "given" - case statements would be called "when". Another
notable renaming: "local" will become "temp".

Larry reiterated the need for optional type and property declarations;
this isn't the same as typing, but it's a way of specifying metadata
about a variable or subroutine. As you supply Perl with more metadata
about variables, it will gain efficiency in terms of storage and
manipulation. As the bell went, Larry was just explaining how 90% of
code wouldn't get that much faster but...

Regular Expressions, Formats, Packages and Modules

Larry's been thinking a lot about direct assignment
to variables from within a regular expressions, but has decided this
isn't the real problem - the real problem is that people want to build
up structures of anonymous hashes or arrays from a regular expression.

There will be set operations on characters classes: for instance, you'll
be able to specify a match of all letters without the vowels. Actually,
this will probably be in Perl 5 as well. There was also time to tell us
that he'd like the /x modifier to be on by default.

The next talk was about subroutines. Prototypes will be extended into a
complete type signature system; the sub keyword would be discarded on
closures - in fact, any curly braces which are not immediately
recognisable will be assumed to be a closure. Parameters to closures
will be "self-identifying", such as $^a and $^b.

Larry said that formats would be a module, so he didn't have
to say anything more about it.

The reference talk opened to the statement that pseudo-hashes must die;
there was loud applause. Larry reiterated that dereferences will be
assumed when a scalar is followed by braces - that is to say, what is
now $foo->[$a] will be reduced to $foo[$a] in Perl 6. This will
require prohibiting whitespace before hash subscripts.

In terms of data types, there will be compact arrays; pseudo-hashes
will be replaced by opaque objects with named parameters. The =>
operator will now create a pair type; the range operators will create
a slightly different type of pair, which gets expanded out to a range on
demand.

There will be a distinction between classes
and modules. Within a class or a module, there will be subpackages:
there will be no more need to type out Myclass::SubclassA::SubclassB;
just like Unix has relative pathnames and directories, Perl will have
modules that can be specified as relative namespaces.

On the theme of modules, module names would be extended to include
metadata on the version and the author's name. The default use
statement would allow the version and author's name to be wildcards.
There will also be virtual interface modules, and better automation
of documentation based on module metadata.

Objects should be easy to declare and have accessible metadata.
When you put an attribute onto the module, it will appear as a variable
inside of the class; outside of the class, it can be accessed as a
method. There will also be optional multimethods, and syntax like
Class.bless($ref) to bless a reference into a class.

Addressing Concerns

Larry mentioned that overloading was a headache; you hate it in
languages that have it, and you hate it when languages don't have it.
Sometimes there are gratuitous abuses of overloading, such as C++'s
left-shift operator to add more arguments to a function. The overloading
system will be a lot cleaner, by specifying operators as special method
names. The vtable system will be leveraged to provide overloading on
objects. There will also be overloading hooks in printf so that new
formats can be defined.

Larry said that many of the proposals about tied variables missed the
point; what is tied is the variable, and not the value. Tying needs to
be naturally scoped to the variable's lifetime, and tying needs to be
declared in advance for efficiency.

He also begged for compassion towards programmers when dealing with
Unicode; we don't want to force people to have to deal with Unicode if
they don't want to, and equally we don't want to leave Unicode people
out in the cold. Strings need to be completely polymorphic, with
internal routines able to specify what type of strings they're able to
cope with. Normalization will normally be done at the filehandle level,
and the type system must remember whether or not some data has been
normalized.

The IPC talk asked for "no pain" installation of new protocols; there will be easy mapping of high-level structured data, such as XML-RPC or SOAP,
from the network onto Perl's internal data structures. Perl 6 will
continue to have safe signals. IPv6 will be supported.

The thread model will basically be ithreads, the new model in 5.6.0;
variables may be shared by declaration, but will not be shared by
default. The Pthread model will mean "share everything". Modules ought
to be thread safe by default - they should declare their thread-safety
or otherwise in their metadata.

Perl will have a parser of its own, written in Perl. This will allow us
to bytecompile the parser, and also have the parser modifiable. It'll
also help us port eval to Perl running on small virtual machines.
Lexical analysis will remain as a one-pass process, and subroutines will
be compiled immediately on parsing.

The command line interface will not dramatically change; only one RFC
concerned it. Larry stressed Perl's role as a glue language, and that it must
cooperate with its environment.

Larry also had a shameful confession; he wrote the Perl debugger, but
hardly ever uses it - he's more a print-statement kinda guy. Hence,
he's happy to delegate the writing of the debugger to other people. He
did, however, point out the heavy dependency a debugger has on the
debugging facilities of the platform it runs on.

He's also punting on the internals, leaving the details of that to
Dan Sugalski, who's talking tomorrow. However, he said that the
internals will be much more modular; they will comprise a software CPU,
and regular expressions will compile down to normal Perl opcodes.
It will implement a variety of garbage collection models, and will use
vtables to despatch operations on values.

What about CPAN?

CPAN got too big to download, and
there's a problem that ISPs never install enough of it for what their
users want. Bundles are a partial solution to this, but of late there
has been more interest in the development of an SDK for Perl.

Tainting will be implemented by the new property method, and sandboxing
will be achieved by using separate interpreter threads.

Perl 6 will attempt to remove some common goofs: Perl 5 has already
stopped taking up huge quantities of memory when you say
foreach $i (1 .. 1_000_000_000) { ... };, but Perl 6 will also apply
the same optimization to @big = 1 .. 1_000_000_000;. There'll also
be support for embedding your tests in POD.

Speaking of POD, =begin and =end will work for commenting because
now =end will go back to the previous state rather than going to POD.
Larry wants multiple POD streams, noting that the DATA filehandle is
essentially just another POD stream.

Perl culture has been mostly self-correcting; the Perl 6 announcement
drove people to work together more strongly and fix up some of the flaws
of the community. However, Larry encouraged everyone to do their part;
newbie friendliness is one thing, but it's important to hold yourself to
higher standards than those to which you hold yourself.

The cleanup of special variables will be an exercise in balancing
cleanup with convenience. For instance $_ will obviously be staying;
$(, on the other hand, will go. This allows us to free up $( ... )
for interpolating expressions. Bareword filehandles will go, and error
status variables may be merged.

Built-in functions. For functions with terribly long return lists, such
as stat, Perl will return modules; some subset of the proposed array
operators (merge, partition, flatten, etc.) will be included in
the core. The logical return values from functions such as system
will be made more sensible; select will be removed.

The standard Perl library will be almost entirely removed. The point of
this is to force ISPs to believe that Perl is essentially useless on its
own, and hence they will need to install additional modlues.
Pragmatic modules will have more freedom to warp Perl's syntax and
semantics, and will also provide real optimization hints.

Larry suggested that the standard module library could be downloaded
basically on demand; there will be a few modules which support basic
built-in functionality and their documentation. Perl 6 should deal with
I18N and L10N issues, and also support the sort of exception handling
that you would expect from languages of its type.

"There are 35 seconds left. Any questions?"

So that's the rudiments of the design of Perl 6. Over the coming months,
we'll see the rest of the Apocalypses, backed up by Exegeses from
Damian, and hopefully, eventually, code from Dan and the rest of the
team. We'll bring you our analysis of how the Perl 6 language is shaping up
when we all get home from the conference!

Let's face it. procmail is horrid. But for most of us, it's the only sensible way to handle mail filtering. I used to tolerate procmail, with its grotesque syntax and its less-than-helpful error messages, because it was the only way I knew to separate out my mail. One day, however, I decided that I'd been told "delivery failed, couldn't get lock" or similar garbage for the very last time, and decided to sit down and write a procmail replacement.

That's when it dawned on me that what I really disliked about procmail was the recipe format. I didn't want to handle my mail with a collection of colons, zeroes, and single-letter commands that made sendmail.cf look like a Shakespearean sonnet; I wanted to program my mail routing in a nice, high-level language. Something like Perl, for instance.

The result is the astonishingly simple Mail::Audit module. In this article, we'll examine what we can do with Mail::Audit and how we can use it to create mail filters. We'll also look at the News::Gateway module for turning mailing lists into newsgroups and back again.

What Is It?

Mail::Audit itself isn't a mail filter - it's a toolkit that makes it very easy for you to build mail filters. You write a program that describes what should happen to your mail, and this replaces your procmail command in your .forward or .qmail file.

Now any mail with perl5-porters in the From: line will be added to the file mail/p5p under my home directory. Any other mail will be accepted into my inbox as normal.

Two things to note here:

Once the mail has been filed to mail/p5p via accept(), it leaves the program. Game over, end of story. The same goes for the other methods such as reject(), pipe(), and bounce().

The last line in the program should probably be an accept() call; mail that reaches the end of the program without being deposited in a mailbox or rejected will be silently ignored. (This may change to an implicit accept() in a later version, to be more procmail-like.)

If you've got a few mailing lists or people you want to filter, you could do this:

This time, we perform a regular expression match to see if either the From: line or the To: line match any of the patterns in our hash keys, and if they do, direct the mail to the corresponding folder. Since we're using ordinary Perl regular expressions, we can do this sort of thing:

We check the From: and CC: headers for my name, and if it's not in either - the mail probably isn't to me. This one only makes sense after we've filtered out mailing list messages, which could validly be sent from a subscriber to a generic list address.

Mail and News

I much prefer reading mailing lists as newsgroups; while a good mail client like mutt can display mail as threaded discussions, I personally prefer navigating in a newsreader. So, how do we gate mailing lists to newsgroups and back? Russ Allbery's News::Gateway module helps do just that - it provides a program called listgate which takes an incoming mailing list message, reformats it as a valid news article, and then posts it to the news server. We can plug this into our mail filter quite easily; assuming we've got the group lists.p5p set up on the local news server and we've configured listgate appropriately, we can just say:

$item->pipe("listgate p5p") if $item->from =~ /perl5-porters/;

Again, if we've got multiple groups, we can use a hash to correlate patterns to groups as we did with mailing lists above.

So much for getting incoming mail to news - what about getting posted articles back into the mailing list? The key to this is in the newsgroup moderation system - when you post to a moderated newsgroup, the article is mailed to a moderator for approval. If we set the moderator of lists.p5p to the list address, we can get our outgoing posts sent to the list. In /usr/news/etc/moderators, you'd say

lists.p5p: perl5-porters@perl.org

Very easy. The only problem is that it doesn't work. Mail messages and news articles have a slightly different format, and some mailing list managers will reject mail messages that look like news articles. So we need to send our message through a clean-up phase first. Instead of sending it to perl5-porters@perl.org, we'll instead send it to news-outgoing@localhost:

lists.*: news-outgoing@localhost

Mail arriving at that account needs to go through another Perl program to clean up and dispatch the outgoing article, and that looks like this:

This reads an article from standard input, drops the Newsgroups, Organisation, and NNTP-Posting-Host headers, reformats it as a mail message using the configuration file /home/simon/bin/news2mail.h to find the address, and then sends it. That config file is just a list of newsgroups and the addresses they belong to:

will be trapped by a rule in your mail filter, and be piped to listgate via a line like

$item->pipe("listgate p5p")
if $item->from =~ /perl5-porters/;

listgate will then post them to your news server, to the group lists.p5p.

 Outgoing articles

will be sent to the moderator address, news-outgoing@localhost for cleanup. The cleanup program will drop unnecessary headers, reformat as a mail message, and then look at the configuration file to determine where to send them on. They'll be sent to the mailing list, and sometime later will be returned to you by mail, to appear in the newsgroup as above.

A Complete Filter

Here, to show off exactly what I do with Mail::Audit, is a suitably anonymized and annotated version of the filter I currently use to process my incoming mail.

#!/usr/bin/perl
use Mail::Audit;
$folder = "/home/simon/mail/";

Anything that actually reaches me is going to be logged so that I can tail -f a summary of incoming mail to one of my terminals.

open (LOG, ">/home/simon/.audit_log");

Read in the new mail message, and extract the important headers from it:

If I'm likely to be at the office, I appreciate a copy of all mail I receive, in case there's something I need to deal with immediately. So I need time-controlled filtering. Try doing this with procmail:

Before we let the article in to the inbox, there's a long list of patterns at the end of the program which match known spam senders. We check the incoming mail against this list, and save it for analysis and reporting:

Caveats

I'm perfectly happy to trust Mail::Audit with all my incoming email. For a while it was running alongside procmail, but now it rules the roost. However, there are some things which you do need to take care about if you want to run it yourself.

Mail::Audit has been tested on qmail and postfix - it should work fine on other MTAs (Message Transfer Agents), so long as they believe that exit 100; means reject. If they don't, you can override the reject method like this:

$item = Mail::Audit->new(
reject => sub { exit 67; }
);

It also assumes that the default mailbox is /var/spool/mail/name where name is user ID of the current user. If this isn't the case, (I believe mh doesn't work like this) say accept("Mailbox") or override accept with a subroutine of your own.

Finally, Mail::Audit isn't sophisticated. It's little other than a wrapper around Mail::Internet. While it's probably perfectly fine for most filters you want to write, don't expect it to do everything for you.

Conclusion

Mail::Audit and News::Gateway are both available from CPAN; together they allow you to very easily construct mail filters and newsgroup gateways in Perl. It's a great way to filter your mail with Perl, and an excellent replacement for moldy old procmail.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org
where YYYYMM is the current year and month. Changes and additions to the
perl5-porters biographies are particularly welcome.

A somewhat abridged summary this week, since I'm out and on the road.
Today Boston, tomorrow Montreal - Wednesday, the world!

Jarkko released
Perl 5.7.2 on Friday the 13th, tempting fate a little.

5.7.2 has an odd-numbered subversion and so it's a development release,
use it at your peril.

Jarkko says:

Perl 5.7.2 can be considered to be the virtual Release Candidate
Zero for the Perl 5.8.0, it is just not called a Release Candidate.
It is in pretty good shape, it is just that it is not *quite* yet
ready to be a major release. No large changes are expected between
now and 5.8.0.

Artur Bergman put in a lot of work to move PMOPs into the pad;
PL_regex_padav will now give you a padlist of regexes. He also made Perl use
re-entrant C library calls where available - workarounds for
localtime and
gmtime are used when Perl is configured with
-Dusereentrant. There's also a per-interpreter memory buffer which helps
out with the re-entrant stuff. (No, I'm not sure how.)

Artur notes that Win32 and Digital Unix are already re-entrant
due to sane C libraries, but other C runtimes may need
-Dusereentrant.

Abhijit also wrote
re_dup, a cloning function for regular expressions, and there have
been discussions of deep cloning functions as well.

The "super-strict"
package; construction - used for turning off the default package - has been
judged to be confusing, buggy as hell and a pain in the neck. As
such, it's being ripped out - Abhijit Menon-Sen is seeing to this.

Oh well, I tried to quickly brush off the discussion of
SUPER:: last week, but it didn't work: 60 or so messages were wasted, uh, spent
on this interminable argument, which essentially boils down to "
SUPER:: is resolved at compiled time, not run time. This is wrong. Oh, no it
isn't. Oh yes it is." Great.

Sadayuki Tomohiro produced a bunch of very useful
Encode fixes, which unfortunately just missed 5.7.2.

Jeff Pinyan spent ages perfecting a patch to warn on
q//o, and then realised this was a bad idea. Roughly the same thing happened
for his idea of
Scalar::Utils::curse, which was more complex than it might first appear. Oh, and Larry
didn't like it the last time someone tried this. Schwern used the
curse patch to take the opportunity to encourage people to use
Test::More when writing new test suites. Jonathan Stowe added an option to
h2xs to produce
Test::More-aware test suites automatically.

Cryptographic algorithms, or ciphers, offer Alice one way to protect
her data. By encrypting the recipe before sending it over the network,
she can render it useless to anyone but Bob, who alone possesses the
secret information required to decrypt it.

Ciphers were once closely guarded secrets, but relying on the
secrecy of an algorithm is a risky proposition. If your security were
somehow compromised, adversaries could read all of your past messages,
and (if you ever discovered the breach) you must find an entirely
different algorithm to use in future.

Modern ciphers, usually publicly known and widely studied, rely on the
secrecy of a key instead. They encrypt the same plaintext differently
for each key; to decrypt a ciphertext, you must know the key used to
produce it. New keys are easy to generate, so the compromise of a single
key is a smaller problem. Although messages encrypted with the stolen
key are rendered readable, the algorithm itself can be reused.

Algorithms that use the same key for both encryption and decryption are
called symmetric ciphers. To use such an algorithm, Alice and Bob
must agree on a key to use before they can exchange messages. Since
decryption depends only on the knowledge of this key, they must ensure
that they share the key by a secure channel that Eve cannot access
(Alice could whisper the key into Bob's ear over dinner, for example).

Most well-known symmetric ciphers are block ciphers. The plaintext to
be encrypted must be split into fixed-length blocks (usually 64 or 128
bits long) and fed to the cipher one at a time. The resulting blocks (of
the same length) are concatenated to form the ciphertext.

The ciphers in widespread use today vary in strength, key length, block
size and their approach to encrypting data. Some of the popular ciphers
(IDEA, Twofish, Rijndael) are implemented by eponymous modules in the
Crypt:: namespace on the CPAN (Crypt::IDEA and so on).

To decide which cipher to use for a particular application, one must
consider the strength and speed required, and the computational resources
available. The decision cannot be made without research, but IDEA is
often considered the best practical choice for a general purpose cipher.

Symmetric ciphers usually use randomly generated keys (typically between
64 and 256 bits in length), and computers are notoriously bad at truly
random number generation. Fortunately, many modern systems have some
support for the generation of cryptographically secure random numbers,
ranging from expensive hardware to device drivers that gather entropy
from the timing delay between interrupts.

Crypt::Random, available from the CPAN, is a convenient interface to the
/dev/random device on many Unix systems. Once installed, it is
simple to use:

use Crypt::Random qw( makerandom );

$key = makerandom( Size => 128, Strength => 1);

For cryptographic key generation, the Strength parameter should
always be 1. The Size in bits of the desired key depends on the
cipher you want to use the key with. Typical symmetric key sizes range
from 128 to 256 bits.

The Crypt modules all support the same simple interface: new($key)
creates a cipher object, and the encrypt() and decrypt() methods
operate on single blocks of data. The responsibility for key generation
and sharing, providing suitable blocks, and the transmission of the
ciphertext, lies with the user. In the examples below, we will use the
Crypt::Twofish module. Twofish is a free, unpatented 128-bit block
cipher with a 128, 192, or 256-bit key.

# And then decrypt the result.
print unpack "H*", $cipher->decrypt($ciphertext);

The implementation raises an important issue: What does one do with the
second chunk of an 18-byte file? Twofish cannot operate on anything less
than a 16-byte block, so padding must be added to the end of the last
block to make it 16 bytes long. NULs (\000) are usually used to pad the
block, but the value used doesn't matter, because the padding is removed
after the ciphertext is decrypted.

Alice can now use this code to encrypt her recipe:

# Assume that $key contains a previously-generated key, and that
# PLAINTEXT and CIPHERTEXT are filehandles opened for reading and
# writing respectively.

# Record the size of the plaintext, so that the recipient knows how
# much padding to remove.
print CIPHERTEXT "$size\n";
print CIPHERTEXT $ciphertext;

The output of this program can be safely sent across the network to Bob,
perhaps as an e-mail attachment. Bob, having received the secret key by
some other means, can then use the following code to decrypt the
message:

This is really all we need for symmetric cryptography in Perl. Using a
different cipher is simply a matter of installing another module and
changing the ``Twofish'' above. From a cryptographic perspective, however,
there are still some problems we must consider.

The code above uses the Twofish cipher in Electronic Code Book (ECB)
mode, meaning that nth ciphertext block depends only on the key
and the nth plaintext block. For a particular key, one could build an
exhaustive table (or Code Book) of plaintext blocks and their ciphertext
counterparts. Then, instead of actually encrypting the plaintext, one
could simply look at the relevant entries in the table to find the
ciphertext.

Because of the highly repetitive nature of most texts, plaintext blocks
and their corresponding blocks in the ciphertext tend to be repeated
quite often. Further, it is often possible to make informed guesses
about parts of the plaintext (Eve knows, for example, that Alice's
messages all have a long Tolkien quote in the signature).

Given enough patience and ciphertext, Eve can start to build a code
book that maps ciphertext blocks to plaintext ones. Then, without
knowing either the algorithm or the key, she could simply look up the
relevant blocks in the intercepted ciphertext and write down large
parts of the original plaintext!

Several new cipher modes have been invented to address this problem. One
of the most generally useful ones is Cipher Block Chaining. CBC
starts by generating a random block (called an Initializ,ation Vector, or
IV) and encrypting it. The first plaintext block is XORed with the
encrypted IV before being encrypted. Thereafter, each block is XORed
with the ciphertext of the block preceding it, and then encrypted.

Here, each ciphertext block depends on the preceding ciphertext block,
and the plaintext blocks so far. Thus, the blocks must be decrypted in
order, and none of the patterns displayed by ECB are present. The IV
itself does not need to be kept secret, and is usually transmitted with
the ciphertext like $size above.

Decryption of the ciphertext proceeds in the opposite order. The first
ciphertext block is decrypted and XORed with the IV to form the first
plaintext block, and each ciphertext block thereafter is XORed with the
previous one to form a plaintext block. Other modes are similar in
intent, but vary in detail, including the way errors in transmission
affect the ciphertext, and the amount of feedback or dependency on
previous blocks.

Alice and Bob could alter their code to perform cipher block chaining,
but the handy Crypt::CBC module can save them the trouble. The module,
available from the CPAN, is used in conjunction with a symmetric cipher
module (like Crypt::Twofish). It handles padding, IV generation and all
other details. The user only needs to specify a key, and the data to be
encrypted or decrypted.

Asymmetric (or public-key) ciphers use a pair of mathematically related
keys, and the algorithms are so designed that data encrypted with one
half of the key pair can only be decrypted by the other. Bob can
generate a key pair and keep one half secret, while publishing the other
half. Alice can then encrypt the recipe with Bob's public key, knowing
that it can only be decrypted with the secret half. Although this
eliminates the need to share keys over a secure channel, it has its
problems, too. For one, most public key encryption schemes
require much longer keys (often 2048 bits or more) and are much slower.

The Crypt namespace contains modules for public key cryptography as
well. Crypt::RSA is a portable implementation of the (now free) RSA
algorithm, one of the most widely studied public-key encryption schemes.
There are interfaces to various versions of PGP (Crypt::PGP2,
Crypt::PGP5, Crypt::GPG), as well as implementations of public-key
based signature algorithms (Crypt::DSA).

Unfortunately, our implicit assumption that the ciphertext is useless to
Eve is not always true. Depending on the information and resources that
are available to her, she can try various means to retrieve the recipe.
The simplest strategy is to try and guess the key Alice used. This is
known as a brute-force attack, and involves repeatedly generating
random keys and trying to decrypt the ciphertext with each one.

The effectiveness of this approach depends on the size of the key: the
longer it is, the more possible keys there are, and the more guesses
will be required, on average, to find the right one. Thus, the only
possible defense is to use a key long enough to make a key search computationally impractical.

How long is a safe key? DES with 56-bit keys was recently cracked in a
little less than a day, but the 128-bit keyspace (range of possible
keys) is 4 * 10**21 times larger still. Although computing power is
becoming cheaper, it seems likely that 128-bit keys will be safe from
brute-force attacks for many years to come.

Of course, there are far more sophisticated attacks that they may be
vulnerable to. As we saw in the description of ECB, cryptanalysts can
often exploit patterns in the plaintext (long signatures, repeated
phrases) or ciphertext (repeated blocks) to great advantage, or they may
look for weaknesses (or exploit known ones) in the algorithm. Often, a
combination of such techniques reduces the potential keyspace enough
that a brute-force attack becomes practical.

Cryptanalysis and cryptographic techniques advanced hand-in-hand; new
ciphers are designed to withstand old attacks, and newer attacks are
attempted all the time. This makes it very important to stay abreast of
current advances in cryptographic technology if you are serious about
protecting your data for long periods of time.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org
where YYYYMM is the current year and month. Changes and additions to the
perl5-porters biographies are particularly welcome.

This was a reasonably busy week, seeing just over 400 messages. I say
that every week, don't I?

Jarkko sadly announced that 5.8.0 wasn't going to happen before the Perl
Conference, but 5.7.2 is imminent:

I think it's time for me to give up the fantasy that 5.8.0 will happen
before the Perl Conference. The stars just refuse to align properly,
too many loose ends, too many delays, too many annoying mysterious
little problems, too little testing of CPAN modules. Luckily, nothing
is majorly broken, and I think that I've more or less achieved what I
set out to do with Perl, so I still hope to be able wrap something up
before TPC and call it 5.7.2 (it must happen next week or it will not
happen), and soon after the conference put out the Release Candidate 1
for 5.8.0, and then keep cracking the whip till we are happy with what
we've got.

These aren't really very new, but they may have slipped through the
net and you haven't noticed them yet, and since they're interesting,
you might want to have a look...

I18N::LangTags
detects and manipulates RFC3066 language tags;
Locale::Maketext is extremely useful for localizing text;
Unicode::UCD is a neat interface to the Unicode Character Database;
Encode is coming on strong, and can now read IBM ICU character tables;
Mark-Jason Dominus'
Memoize module is now part of the core.

Remember last week's weird Amdahl UTS bug, where
Nicholas Clark was convinced UTS C was doing a decrement statement twice? He found the
problem - the decrement statement shouldn't have been there at all...

This prompted him to find a bug in
grok_number; this surprised me a little, because I didn't know that
grok_number even existed. All of the useful, platform-independent code which deals
with numeric operations - casting between different sizes, converting
binary, hex, and octal numbers, recognising numbers in strings, and so
on, has been moved to
numeric.c. Take a look at it, there's a load of handy stuff in there.

Hal Morris, our UTS wizard, also pointed out some unpleasant casting
assumptions, which needed a patch:

UV_MAX must NOT be defined as
(unsigned long){whatever} for UTS, because then comparisons
with double will not work correctly (there is no problem with
(unsigned) typecasts, only with
(unsigned long))

grok_number again came in handy on QNX, when Norton Allen found that
strtoul wasn't setting
errno correctly on overflow. It's sad when we have to start reimplementing
people's broken C libraries, but this is the price of portability.

Ilya found a bug in PerlIO, then found another bug while attempting
to demonstrate it. The original bug was:

The *actual* problem is that char-by-char input requires DUPLICATE pressing
of ENTER key for this key to be seen by Perl. Debugging this problem
(via Term::ReadKey test suite) shows the following logic:

pp_getc() calls is_eof() which does getc/ungetc
calls getc()

[BTW, I see no logic in this sequence of events.]

The problem is that ungetc() can't unget "\n" if this \n is the first char
in the buffer, and quietly drops "\n" to the floor.

Ilya had a lot of invective set aside for PerlIO, which we need not go
into. Needless to say, he did not provide an alternative implementation
of a multi-layered standard IO system as a patch. Or indeed any patch at
all.

Vadim Konovalov did provide a simple patch to clean something up, but then
Andy and
Nick both showed that it didn't help at all, the generated code
being the same and some compilers not being able to cope with lvalue casts,
which Nick had carefully removed and Vadim's patch reintroduced. Guess Nick
might actually know what he's doing after all.

David Lloyd asked how do safely do asynchronous callbacks from C to Perl.
Bejamin Stuhl suggested hacking the core to introduce some checks during
the inter-opcode
PERL_ASYNC_CHECK, and suggested that Perl 5.8.x had a public way of registering inter-opcode
callbacks. David Lloyd replied that PHP/Zend already had this, and you
could even implement a signal checking add-on module without any core
hacking. Paul Johnson went one further, and suggested using a pluggable
runops routine. Surprisingly, this has actually been implemented but nobody really
knows about it;
Devel::Cover apparently makes use of it. Of course, the problem is that only one thing
can use a custom op loop at a time, so David suggested writing an XS
module that allowed other modules to add callbacks. I hope that happens.

Rudi Farkas found a weird one on Win32 - on that platform,
executableness (the
-x test) is determined by the filename being examined. For instance,
foo.bat is classed as executable, but
foo.bar is not, even if they contain exactly the same data. This rather curious
design decision leads to the fact that if you call
stat with a filename, the execute bit is set depending on the extension. If,
on the other hand, you call
fstat with a filehandle, Windows can't retrieve the filename and test the
extension, so it silently sets the execute bit to zero, no matter what
it gets fed. This is Bad, and means that
-x _ on Windows is unpredictable. Radi provided a suggested workaround, but
nobody cared enough about fixing something so obviously braindead to
produce a patch.

Mike Schwern fixed up
MakeMaker to stop producing extraneous
-I...s when building extensions, and also found that the XS version of
Cwd won't do much good as a separate CPAN module, since it relies on the
core function
sv_getcwd, which only appears in 5.7.1. Oops. Oh, and speaking of
Cwd, Ilya patched it up a bit for OS/2, while noting that its results were
untainted on that platform.

Ilya also fixed a glaring debugger bug (oh, the irony) prompting Jarkko
to lament the lack of a test suite. Robin fixed up a couple of weird,
weird bugs in
B::Deparse.

Philip Newton patched a score of typos. Norton Allen updated the QNX
documentation and provided a couple of other fixes.

Piers Cawley found something that looked like a bug in
SUPER:: but was assured that it wasn't; Randal won the day with a reference
to Smalltalk.

Abhijit Menon-Sen (look out for this guy...) made
mkdir warn if it was given a non-octal literal, since that generally doesn't do
what people want, and after prompting from Gisle, did the same for
umask and
chmod. Unfortunately, he forgot about constant folding...

The lists have been very light recently. During the last two weeks of
June, three of the mailing lists received a mere 142 messages across 20
different threads. 40 different authors contributed. Only 5 threads
generated much traffic. Eventually, I'll come up with a better way of
reporting these meaningless metrics.

Now, at the end of the day, I have no fewer than five JVMs installed,
all completely different implementations of two Java standards. As a
Perl programmer, I find this abhorrent. Installing any version of Perl
release in the last 7 years is no different from installing any other
release: download, extract, ./configure -des, make, make test, make
install. Done.

I don't believe I was saying that. My point was that you had a bad
experience installing Java on FreeBSD and have declared that it sucks to
install it. Unsurprisingly, I have never had a problem installing or
supporting Java on Solaris but there are plenty of things to grumble about
Perl sometimes, especially if you deploy multiple versions and
configurations across multiple platforms and multiple versions of those
platforms.

Michael Schwern
pointed out that Solaris is "Sun's Blessed Platform", and it shouldn't be surprising
that Java should install easily there. The discussion then touched a bit
on distributions, licensing, support roles, and, yes, even George Carlin.

David Whipp
asked if
bless could take, and
ref return, a list, allowing for a cleaner multiple-inheritance model for
objects in Perl. Dan Sugalski
simplified the request to object-based vice class-based inheritance, and then
provided some potential trade-offs.

Damian, of course,
submitted code to fake it in Perl 5. He did muse about an
ISA property, though, which would act like
@ISA, but at the object level.

Michael "Class::Object" Schwern
asked why all this (Class::Object) had to be (Class::Object) in the core
(Class::Object). Dan Sugalski
opined:

Doing it properly in a module is significantly more of a pain than doing it
in the core. Faking it with a module means a fair amount of (reasonably
slow) perl code, doing it in the core requires a few extra lines of C code
in the method dispatch opcode function.

To which, of course, Michael Class::Objected:

I've already done it, it was easy. Adding in an object-based
inheritance system should be just as easy, I just need an interface.
$obj->parents(@other_objs) is a little clunky.

...Look at Class::Object! Its really, really thin. Benchmark it, its no
slower than regular objects.
http://www.pobox.com/~schwern/src/Class-Object-0.01.tar.gz

Since we're going to try and take a shot at being encoding-neutral in the
core, we're going to need some form of string API so the core can actually
manipulate string data. I'm thinking we'll need to be able to at least do
this with string:

David Nicol
suggested implementing strings as a tree, vice a contiguous memory block. After some
pondering, this seemed to
grow on Dan, and he is awaiting a yea-or-nay from Larry. Copy-On-Write for Strings
will also be
implemented, although there was no mention of a potential key signature.

So you use Perl, and you probably know that it was brought to you by
"Larry Wall and a cast of thousands". But do you know these people that
make up the Perl development team? Every month, we're going to be
featuring an interview with a Perl Porter so you can get to know the
people behind Perl. This time, Simon Cozens, www.perl.com editor, talks
to Nathan Torkington, a long-time Perl developer and a mainstay of the
Perl community.

Who are you and what do you do?

I'm Nathan Torkington. My day job is a book editor for O'Reilly and
Associates. Before that I was a trainer, working for Tom
Christiansen. Perl folks might know me as co-author (with Tom) of The
Perl Cookbook, the schedule planner for The Perl Conference (and this
year the Open Source Convention), or as the project manager for perl6.

How long have you been programming Perl?

Since the early 90s. I forget exactly when I first started
programming in Perl. I think it was toward the end of 1992, when I
was working with gopher and the early web. Back in those days we
didn't talk about "the web", we were just trying to set up a
Campus-Wide Information Service. I took the plunge and pushed for
(and got) the www as the basis for the CWIS, even though Gopher was
more mature and had more stuff on it.

Using the CERN httpd and line mode browser, I worked on interfacing a
bunch of data sources to the web. Perl was, of course, the language
for that. I did work with SGML, comma separated files, and even used
oraperl once or twice (I feared and loathed Oracle, though, it was only
once or twice!).

So I got into Perl with the web. When I moved to the US in 1996, I
worked as a systems administrator at an ISP. There I rapidly became
the Perl guy, writing admin scripts and CGI applications for in-house
and customers. When I left in 1998, we were using mod_perl and had
a bunch of good Perl programmers.

What, if anything, did you program before learning Perl?

I first learned Commodore BASIC when I was 8 or 9, then 6502
assembler. I learned Pascal on the IBM PC, as well as C and 8086
assembler. At university they taught us Pascal again, then Modula-2
or Modula-3 (whichever it was, it was with a MetroWerks Mac IDE that
kept crashing). Through Pascal, and their eventual reteaching of C, I
got the hang of pointers (yes, I wasn't a very GOOD assembly language
programmer if it took me about six years to learn pointers and realize
why my programs would sometimes die).

What got you into Perl?

The web. After I'd written a few programs, I really enjoyed it. The
language was fun, and the culture good. It was so thrilling to be
able to do something in 5 minutes and 20 lines that used to take two
days in C. In some ways I miss that fun--over the years (and
especially writing the Cookbook) I've learned so much about how to do
things in Perl that there's not much discovery of fun new features any
more. And I'm so used to being able to do things in 5 minutes and 20
lines that I'm no longer delighted when I can do so--I expect it!

Why do you still prefer Perl?

It can do everything I want to do, and I already know it. If I wasn't
a Perl programmer, I'd probably be happy in Ruby or Python. But so
long as there's Perl, I see no reason to become as familiar with those
languages as I am with Perl.

That's not to say I think everyone should program exclusively in Perl.
My friend Jules was the one who learned a lot of languages. He was
writing extensions for Microsoft COBOL in 1992, and has done
significant projects in C++ and Java. He works at a contracting
company where the projects don't always lend themselves to Perl. I'm
totally cool with that.

What sort of things do you do with Perl now? What was the last Perl
program you wrote?

When I became a trainer, I was glad to stop being a full-time
programmer. Since then I've begun to miss using Perl. These days I
only write utilities and basic web applications. For instance, the
last few projects I've written have been:

A small web-based database system for keeping track of the books
I'm editing--author addresses, status, number of chapters to go,
etc. That was a CGI program with home-grown templates and a
DBM database.

Some small tools to unmangle Framemaker files for the Perl
Conference proceedings.

A translation of a Python PDF-generating library to Perl. That's
incomplete, because I struck Python code I need to research.

What coaxed you into the development side of Perl?

The feeling that I should know more about it. In some ways it's
probably insecurity--"sure, you know all stuff about the Perl
language, but can you tell an SV from an SUV?" So I started probing
at the fringes of the internals.

Do you remember your first patch?

My first (and probably only ;-) real patch was to comment
toke.c, the tokenizer. The code that works out where
double-quoted strings and regular expressions end was a lot of fun (and
by fun I mean "pain") to work out. Although I know a lot of low level
(data structures) and high level principles about how the internals work
(compile to op tree, interpret), I'm still missing a lot of the middle
ground that would actually enable me to fix bugs.

What do you usually do for Perl development now?

Nothing, I'm a manager :-) I'm slowly herding Larry towards
finishing
the Apocalypses, and from that will spring perl 6. For Perl 5 I'm
more behind the scenes. I'm often asked to nag or poke or otherwise
get results from others.

Talking of Perl 6, how do you think the project's
going so far?

Last year I'd naively hoped that we'd have an alpha for TPC, but
that's not happening. Instead, we have the start of Larry's
pronouncements. And while perl6 has been slower than we expected,
perl5 has received a shot in the arm! Jarkko's patching like a
madman, there are new internals hackers springing up, and there are
cool new modules (SOAP::Lite, Inline,
Attributes::*, Filter::Simple,
etc.) coming out every week.

That's not to say that we're all blocked on Larry, though. We know the
large shape (and a lot of the details) of how the internals will work.
Dan's specing those out in Perl
Design Documents, and your fair self has implemented a few of the
projected perl6 syntactic features with patches to Perl 5. (You can
download the Perl
6 emulator and play about with Perl 6.) If you haven't already seen
Marcel Grunauer's page of Perl6-like modules, check it out!

So there's a lot of activity in perl6 as well as in perl5.

Finally, what's the best thing about Perl, and what's the worst
thing about it?

Best thing? The way that it values programmer fun as much as anything
else. Perl delights in being a language that is supposed to be fun.
Having seen many people burn out, fun is a good thing. Fun is what
keeps you sane, keeps you interested, keeps you going.

What's the worst thing? Probably the internals. They're ugly and
resemble nothing so much as a Lovecraftian horror. We really want
perl6 to have much nicer innards. Ideally it'll be almost as simple
to program in perl6 innards as it is to program in perl6 itself.
That's one of the main reasons to have perl6--a cleaner core.

Please send corrections and additions to perl-thisweek-YYYYMM@simon-cozens.org
where YYYYMM is the current year and month. Changes and additions to the
perl5-porters biographies are particularly welcome.

This was a reasonably normal week, seeing the usual 500 or so messages.

There's a move on to make the modules under
ext/ free-standing: that is, to be able to say

cd ext/File/Glob

make dist

and get a bundle that can be uploaded to CPAN or otherwise distributed.
The only problem with this is tests. Currently the tests are kept under
t/ of the main root of the Perl source tree, and are run when a
make test is done there. To make the modules freestanding, you'd have to move
the tests to
ext/Foo/Bar/t/ and have the main
make test also traverse the extension subdirectories and run the tests there. But
then, of course, there's another problem. Can you spot it?

make test
is run from an uninstalled Perl, which needs explicit hints about where
to find the Perl library. Hence, the tests in
t/ directly wibble
@INC. This wouldn't work if we're making the modules freestanding, or if we
move the tests to
ext/Foo/Bar/t/. So the trick, which nobody's done yet, is to move the tests to the
right place, change the main
make test to recurse extension subdirectories, but also to propagate an
environment variable telling the module where to find the library. (
PERL5LIB is what you want for this.) That would be a nice little task for
someone...

Speaking of testing,
Schwern got
Test::Simple and
Test::More added to the core, bringing the first All Your Base reference into the
Perl source tree.
Oh, and how many different testing mechanisms?

Robin Houston
incremented
B::Deparse's version number and added some change notes. Here's what's
improved since 5.6.1:

Changes between 0.60 and 0.61 (mostly by Robin Houston)
- many bug-fixes
- support for pragmas and 'use'
- support for the little-used $[ variable
- support for __DATA__ sections
- UTF8 support
- BEGIN, CHECK, INIT and END blocks
- scoping of subroutine declarations fixed
- compile-time output from the input program can be suppressed, so that the
output is just the deparsed code. (a change to O.pm in fact)
- our() declarations
- *all* the known bugs are now listed in the BUGS section
- comprehensive test mechanism (TEST -deparse)

The new test mechanism is great: it runs the standard Perl test suite
through
B::Deparse and then back through Perl again to ensure the deparsed code still
passes the tests. And, as a testament to the work that's been done on
B::Deparse, they mostly do pass.

Schwern also bumped up
ExtUtils::Manifest, which caused Jarkko to appeal for a script which checks two Perl
source trees to find updated versions of modules without a version
change. Larry Shatzer provided one, and Jarkko used it to update
the versions in the current tree. Schwern also put
Cwd on CPAN, and found a weird dynamic loading bug with the XS version
of
Cwd. Oh, and noted that after his benchmarking, there's no significant
performance loss between 5.6.1 without PerlIO and bleadperl with it.

Mike Guy partially fixed a problem whereby when a magic variable like
$1 is passed as a subroutine parameter,
carp and the debugger don't see it properly. Tony Bowden took the
opportunity to ask for either more or less documentation for
longmess and
shortmess depending on whether or not they were meant to be internal to
Carp. Tony wrote a documentation patch himself.

Schwern asked for a better name for those functions, perhaps more in
line with the carp/croak/cluck theme. Jarkko suggested
yodel and
yelp which Schwern implemented.

AnyLoader does anything but interface to any loader.
Ima::DBI has something to do with DBI, but nothing with Ima.
D::oh is the only really funny one. But it's from 1999 and getting
old.
Sex: I still don't know if it is fun or what.
Bone::Easy? Same here.
Semi::Semicolon? Same here.

And now yodle!

Unfair, though, since
yodel and
yelp weren't his...

However, Hugo objected to
yodel/
yelp because the other verbs write to standard error, hence "speaking"
whereas
longmess and
shortmess don't actually "say" anything. Jarkko agreed, and there the matter
rested. (Modulo Rich Lafferty's suggestion of "flame" for objections in
a written medium...)

Jeffrey Friedl (he of the Regexp book) came up with an interesting
patch which adds the new special variable
$^N for the most-recently assigned bracket match. This is different from
$+ which is the highest numbered match; that's to say, given

/(foo(bar))/

then
$+ is equivalent to
$2, whereas
$^N is equivalent to the bracket match for the last closing bracket; that
is,
$1. This essentially allows you to do capture-to-variable, like this:

(?:(\d+)(?{ $phone_number = $^N }))

without having to worry about which number bracket the match was.
(Especially useful if you have to change your regexp around.) Whether
this makes regular expressions cleaner or dirtier, I'll leave up to
you... However, Jeffrey also noted that you can use regexp overloading
(my, that's an obscure feature - look at
re.pm) to make such syntax as

(?\d+)

work. Now that's cool.

Phillip Newton added a nmenomic:
$^N is the most recently closed Nested parenthesis.

Hal Morris got Perl going on Linux/390, with only one test failing. Good
news for the new generation of mainframe hackers.

There's mixed news for the old-timers, though; Peter Prymmer has got it
down to 10 test failures, but one of the tests completely hangs Perl.
Apparently
study under OS/390 is best avoided. He also started some investigation of a
Bigint bug, under the direction of Tels and John Peacock, but left for his
holidays and the discussion moved to the
perl-mvs mailing list.

Oh, and talking of weird platforms, UTS. Hal (who's actually from UTS
Global, so much kudos there) has been testing out recent Perl builds on
UTS, and turning up some ... take a deep breath ... icky numeric
conversion issues.

Nicholas Clark
was convinced they (well, some of them at least) were due to UTS C
going through a
foo-- statement twice, but Hal pointed out he didn't expect UTS C to be
quite that braindead. On the other hand, Nick's analysis looked
convincing...

Hal also fixed up
hints/uts.sh so that UTS now configures and builds nicely at least.

@a
gets "ho" as you'd expect, but
@b gets "ho","ho". Ronald Kimball told him Not To Do That, Then.

Peter Prymmer noted that Perl on VMS was bailing out during the test
suite, leading to lots of bogus failures. This only happens if none of
the DBM libraries (GDBM, DB_File, NDBM or SDBM) were built. As the first
three require external libraries that VMS doesn't have and the last one
is currently broken, it's no wonder Perl is bailing out. The fix is to
work out why SDBM has stopped building on VMS. Peter also produced a lot
of other VMS and HPUX reports.

Andy
was pleasantly surprised to note that the promised "binary
compatibility with 5.005" actually works even in bleadperl. Perhaps we
need to break more things.

John Peacock asked a portability question for XS bit-twiddling; he's
trying to adapt a math library which depends on casting two numbers to a
long and adding them together to avoid overflow. Jarkko's
fantastic architecture experience was brought to bear as he revealed
that Cray Unicos has
long == short == int == double. Oh, and type casting has issues too, so you have to use a union.
Nicholas Clark suggested the old trick of comparing the operands of an
addition with the result; if the result is smaller than either of the
operands, you've overflowed, so you add a carry and off you go.

The news that GCC 3.0 was out brought a rush of people testing Perl out
with it; I got it through on Linux with all tests successful, as did
H. Merijn Brand on HPUX, but with rather a lot more warnings. This was
because HPUX messed up the test for
__attribute__ due to using a HP linker instead of a GNU one. Merijn and Jarkko got
this fixed up.

Artur continued his iThreads quest; he renamed the "shared" attribute to
"static", (and then again to "unique" after objections) presumably to
free it up for an attribute which actually does share variables between
interpreters, and also added a function which cloned the Perl host.
He said that
threads-0.01, the new threading module, will be released to CPAN when 5.7.2 hits
the road. (Which Jarkko keeps hinting will be very, very, very soon
now.) He also complained bitterly when Marcel Grunauer tried to document
attributes.pm as useful, despite the fact that Marcel has some really very cool
modules on CPAN based on it...

Marcel also found, and Radu Greab fixed, an insiduous bug in
split, whereby if the default whitespace pattern was used for one iteration
of a loop, it would be used for all succeeding ones; the
PMf_WHITE flag for the regular expression was being set but never
unset. Urgh.

Ilya
produced some rough changes documentation for OS/2, as well as some
other little patches. Norton Allen provided some QNX updates.

idiom for counting the number of return values from an operation. That
was something I hadn't seen before; you learn something new every day...
Until next week I remain, your humble and obedient servant,