(If you don't follow the latter link, ask yourself this: would you use Perl
5 on another VM if you didn't have access to the CPAN? As long as any CPAN
stack you need includes an XS dependency, you're stuck with the C
implementation of Perl 5. Break that dependency and... you see where this is
going.)

If you read a good book on compilers such as SICP
or the
Dragon book or even a good book on Lisp,
which is half of the same thing, you'll eventually run into the idea that you
can represent any program in a tree-like structure.

That is to say, any type of computation you wish to perform can be
represented with the right data structure, and that data structure happens to
be a tree. Even the venerable "Hello, world!" program from K&R is a
tree:

STATEMENTS
/ \
print exit
| |
"Hello, 0
world!"

(You can represent this tree in a lot of ways, but you get the idea.) As
SICP explains, the execution model governs how you traverse this tree.
Obviously in this tree, you start at the topmost node, then evaluate leftmost
depthfirst until you end.

Even though your processor likely doesn't execute programs by traversing
trees and your language's favorite runtime doesn't either, tree structures are
still very useful in compilers because they're simple data structures that are
easy to traverse and produce and manipulate.

This is one reason that fans of Lisp think that Lisp is the ultimate
programming language: you write it by writing this tree directly as source code
and you can manipulate that tree with source code as if it were a tree, because
it is.

Many of the rest of us don't want to spend the rest of our lives writing
trees by hand, but sometimes we do want to do things with source code
without having to write our own compilers, or at least our own parsers.

Do you know know people say "Parsing
Perl is Hard?" That's because it is. It's not impossible, but the
Perl 5 parser is a complicated program that mixes up the traditional roles of
lexing and parsing such that replicating that parse completely is difficult, at
best. This is why writing a syntax highlighter for Perl 5 is more difficult
than writing one for Lisp.

Do you know how something like Devel::Declare works?
I do, and I don't want to explain it to you. It's not particularly yucky magic,
but it's not particularly lovely magic either.

Do you know how Perl 5's optimizer works? It's very limited.

Do you know how Perl 5 source filters work? Not that well! That's why the Switch module was
deprecated in the commit after it became a core module.

Do you know how PPI works? Most of
the time, very well, but with lots of magic and a few very well understood edge
cases and not a lot of speed.

All of these problems have the same root cause: it's exceedingly difficult
to manipulate a Perl 5 program as anything other than text, because the one
thing that unambiguously understands that program won't share its
understanding.

(That's not entirely true; a few years ago, Larry added a
compile-time option to Perl 5 to produce an annotated tree from the
parser/lexer/compiler, but no one's done much with it.)

A traditional compiler parses a document into a tree structure, then
manipulates that tree into at least one and probably more trees and finally
emits code in another language. It's very patriotic (from tree to shining
tree). This pattern is no accident; it's the basis of many programs (everything
is a compiler).

This process is so well understood that patterns exist for treating this
process as a pipeline of tree transformations. As long as you know what kind of
tree you're going to get and what kind of tree you need to produce, you can add
your own transformation step. (If that sounds like Plack middleware, there's no coincidence there
either.)

A formalized AST in Perl 5, representing an official separation between the
lexing/parsing and execution phases of Perl 5, made available to any and all
programs would let people write better syntax highlighters, sure, but also
better IDEs (finding function declarations and associating them with source
code would be easier) or debuggers (see again "finding things) or optimizers (a
pipeline of tree transformations) or transliterations to other VMs or languages
(still not entirely easy but much more possible) or
serialization mechanisms (avoid parsing; dump an AST to the execution engine)
or even little languages atop Perl with their own parsers which compile to the
Perl 5 AST and run as if they had been Perl all along.

Think about that last point for a moment.

(Think about that last point in the context of syntax weirding mechanisms
like MooseX::Declare, or
anything which uses string eval.)

Are you sold yet?

Here's the bad news: this is a lot of work. It needs expertise and it needs
research and planning. It needs at least one champion, and it probably needs
funding. The first approach will probably fail. It will take longer than anyone
wants. The only way it will happen is if someone stubborn appears and says
"I'll do that!" or "I'll fund that!" and gets just enough support to keep going
past the difficult parts.

9 Comments

As you will know, perl5 already has an AST. The optree. It can even be fully transformed back to equivalent source code via B::Deparse.

The problem is more that the AST is not documented at all, and changes are left over to CPAN, whilst the modules which did those analysis and optimizations and transformations were not updated with the big 5.10 optree rewrite. op_seq was thrown away, optree changes are now actively blocked at run-time, even without threads, but most signatures parsers e.g. do only work at run-time.

Any attempt to document the optree was met with silence.

Any attempt to advance compile-time optree optimizations with normal optimization techniques, like function analysis for our slow parts (exceptions? locals, lexicals?, return context and types?), function inlining, method resolution, method inlining, tailcalls, types, constant folding are either ridiculed, blocked or ignored. Or left to CPAN which would slow it down instead of doing optimizations.

The language is also by far not advanced enough to allow this, and discussions are not possible.

There are no proper constants yet (besides the syntax only CONSTSUB), lexicals const, const classes, const @ISA, everything is allowed at run-time, and no attempts are made to improve the situation.
Signatures and a new class syntax would solve a lot, but the current proposals ignore those possibilities.

I'd like to implement all that, but I'm waiting for unmade decisions. And I see no way that decisions are made amongst the uninformed. The deciders don't even listen to perl6, who solved these problems already some time ago.

The difference between an abstract syntax tree and the optree is twofold: the optree is neither abstract nor concerned with syntax. Another problem is that it's not really possible to create something to hand to the Perl 5 runtime without copying and pasting and modifying big chunks of the Perl 5 parser. While you might argue in a very technically precise sense that it exists and is of a nature with an AST, it's not usable in the sort of way someone might want to use an AST.

(I believe any decent AST for Perl 5 would give you much more stability than the optree does.)

As for Perl 6 solving these problems and more, I'm reserving judgment on the quality of its implementation design decisions until an implementation is usable for my purposes.

flavio's perlito already creates a perl5 AST, and larry's P5Grammar is in work. Personally I prefer P5Grammar over the optree over perlito's AST, but perlito has nice emitters and is easy perl5.

The perl5 optree is very problematic to optimize because it is not in standard AST format :) And a hack. But it is very performant.
And I doubt that P5Grammar and perlito can be made faster than the optree. There are very expressive and high-level.

A new low-level parser and ast optimizers would be definitely a good idea. I was toying with the idea of a new perl5 vm or nqp backend (called p2), based on vmkit (which is based on LLVM and MMTk). Not parrot, as it is not performant enough. But any fresh start seems to be better. Or nqp.

At OSCON 2012 I got to hear Rob Pike talk about the go fix utility they wrote for the Go language. It takes their AST and examines it for old, deprecated or bug-fixed core code and *automatically* replaces it with up-to-date, fixed NEW code.

I was sitting in that session with my mind completely blown. *kaboom*

If we ever had something like that in Perl, people's heads would spin around. Obviously, Go doesn't have 20+ years of legacy to deal with, but an AST would be really amazing.