Marpa resources

Sun, 22 Jul 2012

"One of the secrets to mathematical problem solving is that one needs to place a high value on partial progress,
as being a crucial stepping stone to fully solving the problem."
--
Terence Tao

Once an error is found,
a traditional parser is
pretty much lost.
This is true for even
state of the art compilers and interpreters.
One small typo results
in many screens of useless
diagnostics.
If you're an old hand,
you scroll back over these,
knowing that, for the better quality compilers,
the first few lines often have something to
do with the real problem.
With Marpa, we can do better than this.

Robust error reporting
is equivalent to the problem of finding code interspersed
among arbitary text.
For example,
consider this mixture of text and Perl fragments.

Note: line:column figures include preceding whitepace
The next line is a perl fragment
{42;{1,2,3;4}}
Code block from 3:5 to 3:13
Code block from 2:33 to 3:14
The next line is a perl fragment
{42;{1,2,3,4}}
Hash from 7:5 to 7:13
Code block from 7:5 to 7:13
Code block from 6:33 to 7:14
The next line is a perl fragment
{42;{;1,2,3;4}}
Code block from 12:5 to 12:14
Code block from 11:33 to 12:15
The next line is a perl fragment
{42;+{1,2,3,4}}
Hash from 16:6 to 16:14
Code block from 15:33 to 16:15

I have written the prototype of a utility
(ucurly.pl)
that finds Perl fragments
scattered among other material,
and parses them.
To test the accuracy of the parse,
I have it look for one of Perl's parsing ambiguities ---
anonymous hash constructors that could also be code blocks.
For the example above,
ucurly.pl finds what there is to be found:

My Marpa-based Perl parser,
as it currently stands,
is a partial Perl parser.
On select problems, it is 100%.
For example,
it understands all of the output from Data::Dumper.
And it scored perfectly in the example in this blog post.
It will parse perhaps 90% of a typical, complex Perl program.

At this point in the Marpa-based Perl parser's evolution,
rather than add Perl syntax,
I decided to use it to investigate partial parsing.
Some readers will have observed that "90% coverage"
is a synonym with "useless" in current parsing practice.
That's because traditional parsing algorithms are close to binary.
Traditional parsers only understand input if it is correct or nearly so.

When partial parsing is possible,
things are more interesting.
For example, I turned
ucurly.pl,
my ambiguity-finding utility,
loose on
CGI.pm,
a complex bit of Perl code by anyone's standard.
My utility failed to parse a lot of correct Perl code
in CGI.pm,
producing a lot of false negatives.
But this did not prevent it from singling out
line 461:

@result = map {ref $_ ? $_ : $self->_decode_utf8($_) } @result;

This line is ambiguous -- the first argument to map
could be either an anonymous hash constructor,
or a code block.
My guess, and I believe, the Perl's parser's guess,
is that it's a code block.
But it'd be nice to have a utility that spots these,
so that all doubt can be removed: