Perl Lines of the Day from September, 1998

A program is generating an HTML page that includes a menu where
they can select their home state. The page should have the
appropriate state pre-checked, as a convenience to the user. The page
might be displayed as a result of subbmitting a similar page (or even
the same page, if there was an error) or it might not. In the firmer
case, the program has a hash %in of form input data; in any
case, it has a hash %uinfo of user information from a
database. The context of the Line of the day is something like this:

Where selected will be omitted for all but one of the
options. How do we decide whether to include the selected
attribute? The Line of the Day is a peculiar way to write this,
although I didn't realize that until after I had written it. A more
normal way would be:

If there is a prior form submission that includes the
state, we look at that to decide which item to select; if not we
default to the user information from the database and select the item
we find there.

The version without the explicit $selected_state variable
is the one that occurred to me first, and I was surprised by how much
more difficult it was to explain and to understand.

I once wrote a CGI program that needed to display items in a
certain order. Actually, what the client asked for was to be able to
specify that a particular item, that month's featured product, would
be displayed first. I knew it wouldn't stop at that, and that they'd
probably want to be able to specify the order of the first several
products, followed by the rest of the products in no particular order.

I had the client make a new column in their product database called
`order'; I said ``To make an item appear first, put in `1' as the
order; to make it appear second, put in `2', and so on.'' If there was
no order, the item would appear after all the ordered items.

While I was writing that, it occurred to me that they might one day
want to be able to force an item to appear last. Just leaving out the
order number wasn't enough, because those items all appeared together
at the end in a random order. And specifying an order of 1,000,000
wouldn't work because even an item with order 1,000,000 would appear
before the items with no order at all. So I added the feature that
negative order numbers specified a position from the end.

The Line of the Day is the code from the sort comparison function
that sorts items into order. The complete function is:

A `product' here is a reference to a hash that includes the
product's order, name, price, and other items. THis comparison function imposes the following order on the products:

1 2 3 4 ... 0 ... -4 -3 -2 -1

It's interesting for at least two reasons. One is the use of an
arithmetic computation as a return from a sort comparator. Arithmetic
computations in sort comparators are more common in C than in Perl,
because C has no <=> operator. You never need to have
<=>, because simple subtraction does the same thing, and
that is what the C programmers do:

But in Perl, because there is a special <=>, we tend
to forget that what it does is basically subtraction.

The other interesting thing here is that I have to explicitly add 0
to the orders. Otherwise, if a product had no order at all, and I
returned it without converting it to a number first, I might be return
the empty string to sort, and Perl's sort, unlike
the rest of Perl, does not recognize the empty string as a synonym for
0.

Someone showed up in comp.lang.perl.misc and wanted to
know how to write a s/// to replace the literal string
$a with something else, say b. He'd tried all sorts
of plausible things, and none of them worked. For example, this obviously doesn't work:

s/$a/b/;

Because patterns are interpolated like double-quoted strings, so
the $a is expanded instead of being taken literally. The guy asking the question had tried all sorts of things:

s/$a/b/;
s'$a'b';
s/\Q$a\E/b/;
s'\Q$a\E'b';

None of these work! s''' prevents double-quote-like
interpolation on the right-hand side, in the subsitition
string rather than in the pattern. And the ones with \Q...\E
don't work because the value of $a is interpolated
before the \Q...\E escaping takes place.

Of course, what he wanted was just this:

s/\$a/b/;

That's what you get if you apply the \Q...\E in your
brain, and then insert the result from your brain into the program.

Anyway, that raises the question: What if you can't (or won't) do
quotemeta in your head? What then? In that case, you fall
back on the very general solution:

$pat = quotemeta '$a';
s/$pat/b/;

The recipe here is: ``To match any literal string, first
quotemeta it and store the result in a variable. Then
substitute the contents of the variable.''

Of course, you always get some whiner who wants to do it in one
line. There's a general recipe for inlining any code at all into a
double-quoted string, and that's what I used here to get the Line of
the Day.

I included this because I liked the way the condition of the
if was itself a conditional ?: expression.

This line appears in my implementation of diff. The function it's in accepts an
optional parameter, which is a reference to a function, $eq.
$eq is used to compare certain items if you specify it, and
it defaults to the builtin eq function.

One way to implement this would have been by doing something like
this:

my $eq = shift || \&default_eq;
...
sub default_eq { $_[0] eq $_[1] }

But then the user would have to pay for the subroutine call for
default_eq even if they didn't specify their own $eq
function, and that seemed like a shame.

So instead, I inlined the test for $eq. If it's there, the
if condition invokes it on the appropriate arguments, and
otherwise, it just uses the builtin eq without calling any
subroutines.

Whether this is actually a performance improvement over the
subroutine version is not clear.

I once asked Abigail `What use are negative lookahead assertions,
anyway?' And she immediately came up with this extremely useful use.
I was impressed.

What does this do? If $HEADER contains the header of a
mail message, it splits the header into lines. That may seem easy,
but mail headers are a little more complicated than you might think.
Just doing split /\n/m is not enough, because mail headers
sometimes include continuation lines like this:

Those indented lines aren't new headers; the indentation indicates
that they are continuations of the previous lines. The first
Received: header doesn't end at the
...130.132.32.49); it continues onto the next line, and ends
at the -0000. Similarly, the second Received header
spans three lines and ends at -0400.

Reading in a mail message is very easy:

{ local $/ = "";
$HEADER = <>;
local $/ = undef;
$BODY = <>;
}

Now the header is all in one place, and you might like to split it
up into lines. The Line of the Day is a good way to do that. It does
a split. What is the split delimiter? It's a little
subtle. Normally a split delimiter is a string like : or a
pattern like \s+. In that case, split discards the
delimiters and returns the stuff in between.

In this case, the delimiter is the empty string, so split
won't discard anything. But it's not just any empty string; it's a
pattern that will only match the empty string in certain places:
^ forces it to match only empty strings that are at the
beginning of a line, and (?!\s) forces it to match only empty
strings that are not followed by white space. Thus: A new header line
begins whenever there's a new line that does not begin with white
space.

What good was the ?! here? We could have gotten a similar
effect with /^\S/ instead. But then the program would have
the wrong behavior, although it would split in the right places. But
the split delimiter wouldn't be the empty string any more; instead, it
would be a single character at the beginning of each header.
split throws away the delimiters, so this split would discard
the first character from each line, leaving us with items like
ubject: and ate: in your array.

When you run Perl with the -T flag, it is in taint
mode. This means that Perl keeps track of whether the data in
your program is trustworthy, and diagnoses and aborts attempts to
perform unsafe operations with untrustworthy data. Data that came
originally from the user or which might be under the control of the
user are called tainted, and are considered untrustworthy.

It sometimes happens that you need to use tainted data in an unsafe
operation, presumably after validating and checking it carefully. To
prevent Perl from aborting your program, you must first
launder or untaint the data. The usual way to do
that is something like this:

sub untaint { $_[0] =~ /(.*)/;
$1 }

Data that's copied to the $1, $2... variables is
always untainted. According to the perlsec manual page, this
is the only way to launder tainted data.

Well, not quite. The taintedness of data is stored in part of
Perl's SV structure. But Perl's hash keys aren't
SVs; they're regular strings. They can't be numbers, or
references, or undef, or blessed, or have any of the other
special properties that Perl scalars have, and in particular they can't be tainted.

That leads us to the Line of the Day: An alternative recipe for
taint laundering: Install the data to be laundered as a key in a hash,
and then retrieve it again with `keys':

Note: When I asked on p5p about this,
Larry said not to use it, because hash keys might become SVs
someday. But this is worth remembering anyway, because some day you
might use untrustworthy data by accident in a way you hadn't meant to,
and Perl might not catch your mistake because you inadvertently
laundered the data by using it as a hash key.