Conceptual Chunking in Perl

Larry Wall wrote:

... Typically, Perl programmers feel that they
can write code that is more readable in Perl than in other languages,
because they have freedom of choice in how to "chunk" the problem (to
borrow a term from cognitive science).

ozan s. yigit wrote:

you have posted on this topic extensively, and i think there is much
to discuss in there. but i find this little bit bewildering: what on
earth are you talking about? does [eg] python offer less "freedom of
choice" to "chunk" a problem? how so?

Okay, people have asked for such examples before, and I haven't responded,
because the idea of chunking is pervasive in Perl and I couldn't figure
out where to start. :-)

But here goes...

Psychologically speaking, "chunking" is the ability to reduce the
complexity of a problem by making foreground/background or
inside/outside distinctions and concentrate on one or the other. As
such, the main enabler is the ability to define and recognize boundaries
between foreground and background, between inside and outside.
Classically, languages provide relatively few ways to make boundaries,
ranging from the highly abstract object, down through modules and
functions, clear down to loop abstractions, formatting conventions,
statement delimitation, parenthesizing and quoting.

Perl is roughly equivalent to Python on more abstract levels, with some
differences. Perl provides closures, while Python goes more deeply into
some of the metaclass stuff (both of which I think are benign but
relatively useless to mere mortals). I think the Perl module mechanism
is a little more flexible than Python's--it's sufficiently general that
I also use it for the pragma mechanism, because the semantics of
importation are under the module's control, and the normal importation
is merely a matter of reusing the standard export implementation. The
user has a lot of flexibility in deciding which parts of a module's
definitions should be defined how (in C or Perl) and when (immediately
or lazily). There's flexibility in choosing between lexical and dynamic
scoping. There's flexibility in choosing early or late binding. You
can change inheritance on the fly, if you like. You can use objects
where they make sense, and avoid them where they don't. All of this
affects how you decompose your problem, and that in turn gives you
flexibility in chunking.

On a less abstract level, Perl lets you choose the psychological boundaries
of loops, for instance. You can name a loop according to what it is
processing. A name is a high-powered way of hiding an abstraction,
mentally speaking:

LINE:
while (<>) {
next LINE if /^#/; # Discard comments.
print;
}

In my mind, I can now pigeonhole that as the LINE loop, and reduce it to
a single little lump of cybercrud, even if the loop is 582 lines long.

Alternately, you can go with a more customary loop, which gives a
different psychological "feel":

while (<>) {
next if /^#/; # Discard comments.
print;
}

Since it's an anonymous loop, I now rely psychologically more on how it
looks on the screen visually. It has an easily seen beginning and end.
Things don't just "peter out" as they do in languages that use
indentation as syntax. (Editorial opinion: the indentation scheme of
Python is okay in small examples, but doesn't scale very well. It
rapidly breaks down, visually and psychologically, as soon as you get
any construct larger than a screen. It's all very well to argue,
as some have argued, that you should never write a construct larger than
a screen in Python, but then I'll respond that my point about
flexibility in chunking is thereby proven. What if the user wants a
chunk that is larger than the screen? Dangling, open-ended syntax is
pretty useless at the discourse level. I'll go with Aristotle on
that one.)

You can reduce a loop to one line to reduce its "significance" even further:

while (<>) { print unless /^#/ }

You can even pretend there isn't a loop there:

print grep !/^#/, <>;

You can delegate the loop to someone else:

print `grep -v '^#'`;

Well, that's probably enough about "while" loops, though we could
certainly go into the psychological difference between "while" loops,
C-style "for" loops and "foreach" loops. Linguistically, a foreach loop
is functioning as a topicalizer for the interior of the loop.

foreach $line (@lines) {
print;
}

For mental flexibility, Perl gives you an anonymous form:

foreach (@line) { print }

Since "for" is a synonym for "foreach" in Perl, you sometimes even see it
used strictly as a topicalizer for a single value!

Moving on down the abstraction level, there is psychological value in
having a single way to delimit statements, and making all whitespace
equivalent. This gives the user freedom in how to line things up
vertically within a statement to enhance readability.

The notion of statement modifiers allows people to relegate unwanted
psychological facts to the right side of the screen where they can
be ignored.

Within statements, the whole notion of context in Perl is built around
the concept that various operations are semantically "governed" by their
surroundings. The choice of whether to parenthesize says a lot about
how the programmer thinks of it. If the programmer wants to use
the rest of the line as the scope, so to speak, you might see

return print reverse sort bynum values %hash;

Someone who doesn't like line scopes might write something more like

return print(reverse(sort bynum values(%hash)));

Again, this is psychological flexibility. Another person will choose
the (presumably) equivalent

return print sort {$b <=> $a} values(%hash);

To this person, the sort subroutine isn't even a subroutine.

Interpolative contexts are important in Perl. List operators do
automatic list interpolation on their arguments. Double-quoted strings
(and related contexts) provide a very convenient chunking mechanism for
hiding a lot of concatenation. Variables in this context look just the
same as they do in the rest of Perl--that's one reason I put $ and @ on
variables in the first place. (The other is that noun markers like $
and @ allow quick visual figure/ground distinctions, enhancing
readability. A Perl variable is also a kind of "chunk".)

One could also write reams about the different ways to write a pattern
match in Perl. What other languages let you break up your regular
expression chunks with both horizontal and vertical whitespace, and even
comment each chunk, if you so desire? Or you can do as is traditionally
done and visually encapsulate the whole unspeakable mess on a single
line.

Finally, quote delimiters. Forcing people to use just a few quote
characters forces a lot of noise into a lot of programming languages.
Many UNIX languages suffer from backslashitis and leaning-toothpick
syndrome. Letting people pick their quote characters makes things a
little harder for emacs, to be sure, but lets people encapsulate things
visually the way they may be used to. Why force someone to say

tr("abcdef\"", "ABCDEF'");

when

tr [abcdef"] [ABCDEF'];

is clearer, or even

tr [abcdef"]
[ABCDEF'];

And note how this interplays well with the free statement formatting.

On multi-line quotes, why force someone to use triple quote (ugh)?
Why not make it easier for the person and harder for the computer,
and let the user pick the trailing delimiter? At least the shell's
got this right.

Here's a convenient mental trick. If I know that the text I'm dealing
with contains no blank lines, I often use a blank line as my
final delimiter. So instead of saying

print <

I just say my delimiter is nothing

print <

and make sure the next line is blank. It works very well as a form of
visual chunking. Python folks in particular should appreciate the idea
of using the absence of something as the final delimiter.