Sun, 20 Dec 2015

The way you'll often hear that phrased is that top-down parsing is
looking, starting at the top,
and bottom-up parsing is looking, starting at the bottom.
But that is misleading, because the input is at the bottom --
at the top there is nothing to look at.
A usable top-down parser
must
have a bottom-up component,
even if that component is just lookahead.

A more generous, but still accurate, way to describe the top-down
component of parsers is "prediction".
And prediction is, indeed, a very useful component of a parser,
when used in combination with other techniques.

Of course, if a parser does nothing but predict,
it can predict only one input.
Top-down parsing must always be combined with a bottom-up
component.
This bottom-up component may be as modest as lookahead, but it
must
be there or else top-down parsing is really not parsing at all.

So why is top-down parsing used so much?

Top-down parsing may be unusable in its pure form,
but from one point of view that is irrelevant.
Top-down parsing's biggest advantage is that it is highly flexible --
there's no reason to stick to its "pure" form.

A top-down parser can be written as a series of subroutine calls --
a technique called recursive descent.
Recursive descent
allows you to hook in custom-written bottom-up logic at every
top-down choice point,
and it is a technique which is
completely understandable to programmers with little or no training
in parsing theory.
When dealing with recursive descent parsers,
it is more useful to be a seasoned, far-thinking programmer
than it is to be a mathematician.
This makes recursive descent very appealing to
seasoned, far-thinking programmers,
and they are the audience that counts.

Switching techniques

You can even use the flexibility of top-down to switch
away from top-down parsing.
For example, you could claim that a top-down parser could do anything my
own parser
(Marpa)
could do, because a recursive descent parser can call
a Marpa parser.

A less dramatic switchoff, and one that still leaves the parser with a good claim to be
basically top-down, is very common.
Arithmetic expressions are essential for a computer language.
But they are also
among the many things
top-down parsing cannot handle, even with ordinary lookahead.
Even so, most computer languages these days are parsed top-down --
by recursive descent.
These recursive descent parsers deal with expressions
by temporarily handing control over to an bottom-up operator
precedence parser.
Neither of these parsers is
extremely smart about the hand-over and hand-back
-- it is up to the programmer to make sure the two play together nicely.
But used with caution, this approach works.

Top-down parsing and language-oriented programming

But what about taking top-down methods into the future of language-oriented programming,
extensible languages, and grammars which write grammars?
Here we are forced to confront the reality -- that the effectiveness of
top-down parsing comes entirely from the foreign elements that are added to it.
Starting from a basis of top-down parsing is literally starting
with nothing.
As I have shown in more detail
elsewhere,
top-down techniques simply do not have enough horsepower to deal with grammar-driven programming.

Perl 6 grammars are top-down -- PEG with lots of extensions.
These extensions
include backtracking, backtracking control,
a new style of tie-breaking and lots of opportunity
for the programmer to intervene and customize everything.
But behind it all is a top-down parse engine.

One aspect of Perl 6 grammars might be seen as breaking
out of the top-down trap.
That trick of switching over to a
bottom-up operator precedence parser for expressions,
which I mentioned above,
is built into Perl 6 and semi-automated.
(I say semi-automated because making sure the two parsers "play nice"
with each other is not automated -- that's still up to the programmer.)

As far as I know, this semi-automation of expression handling
is new with Perl 6 grammars, and it
may prove handy for duplicating what is done
in recursive descent parsers.
But it adds no new technique to those already in use.
And features
like

mulitple types of expression, which can be told apart
based on their context,

n-ary expressions for arbitrary
n, and

the autogeneration of multiple rules, each allowing
a different precedence scheme,
for expressions of arbitrary arity and associativity,

all of which are available and in current use in
Marpa,
are impossible for the technology behind Perl 6 grammars.

I am a fan of the Perl 6 effort.
Obviously, I have doubts about one specific set of hopes for Perl 6 grammars.
But these hopes have not been central to the Perl 6 effort,
and I will be an eager student of the Perl 6 team's work over the coming months.