Other Resources

Posts about Marpa: an annotated list

Practical general parsing makes writing a DSL much simpler than it has been in the past.
The posts in this series of tutorials are among my most popular.
Although
you can join the series starting at a topic of special interest,
the easiest way to read (or at least skim)
them is in the order listed.

Making DSL's even simpler.
The best place to start.
An updated version of one of my most popular posts,
it describes Marpa's Scanless interface (SLIF).

A self-parsing and self-lexing grammar.
A tutorial that walks through the Scanless interface's own self-defining and self-hosting
grammar.
Reading a grammar that is defining itself can cause dizziness, but if you tolerate
that sort of thing,
this makes an excellent advanced tutorial.

BNF to AST
This post is based on the chapter on the Language Design Pattern,
from the very influential
Design Patterns book.
In that book, the authors (widely known as the Gang of Four)
did not attempt to parse the language of their example --
there was
no simple way to produce an AST straight from BNF.
Now there is.
This tutorial contains the Gang of Four example worked out in full with a Marpa-based
parser, and
a sequel
talks about some of the implications of easier parsing.

Marpa and procedural parsing 2013.
"Procedural parsing" is a fancy term for giving up on your parsing algorithm and just hacking
out the parse.
With the currently favored parsing algorithms, a lot of this type of code is
necessary, and programmers have come to expect a parser to allow for it.
Marpa tries to offer the best of both worlds:
a declarative parser which knows exactly where it is in your BNF at all times,
but one which is willing to step out of the way and let custom code take over,
at the same time sharing what it knows with the custom code.
Perl heredocs are an example of a language feature which requires procedural
parsing.
Parsing these correctly requires jumping back and forth in the input.
This tutorial for this post shows how to handle them straightforwardly using Marpa's SLIF.

Marpa and combinator parsing.
Marpa allows variable-length and ambiguous tokens.
By combining this with external parsing,
and generalizing lexers to combinators,
Marpa becomes a better combinator parser --
more powerful and more flexible.

If you're looking for tutorials, you'll want to start, and probably to
end, with those based on the SLIF interface.
The best of mine for the beginner are above, in this section.
The
section on programming language design
also contains some posts whose examples work as SLIF-based tutorials.

While SLIF-based tutorials should be strongly preferred, my
pre-SLIF tutorials,
described below
may still have some value.
They contain many useful examples,
and most of the code in them carries over to SLIF line-for-line.

About Marpa in general

Why Marpa works: table parsing.
Marpa is different from the other widely used parsers.
The others use stacks, state machines or both.
Marpa is a table parser.
This post explains the advantages of table parsing,
but also the challenges faced in implementing it.

The solved problem that isn't, is.
In an excellent post, Laurence Tratt described parsing as "the solved problem that isn't".
Tratt expressed my reasons for creating Marpa very well.
I claim that,
with Marpa, the parsing problem is, in fact, now solved.
In this post I state, point by point, why I believe
I can accurately make that claim.

A grammar that exemplifies, describes and parses itself:
This post contains the grammar for Marpa's new BNF interface,
a grammar that is both self-description and self-example.
In the 1970's self-reference was very much the fashion.
But this grammar is not just for looks -- it drives its own compiler
and will be the basis of its own development:
it is handy enough as an interface
to make dealing with a circular dependency worthwhile.

Marpa and the Ruby Slippers:
An older post,
but one with a nice explanation of one of Marpa's significant new features:
Ruby Slippers parsing.

Fast, handy languages:
Which languages are fast (that is, linear) for Marpa?
Also, when should you NOT use Marpa?

Linear? Yeah, right:
Marpa is linear for all the deterministic context-free grammars, and
therefore all the classes of of grammar currently in practical use,
or so I claim.
Why should you believe that?

Why I Stuck with Perl:
I liked Python very much, and at the time I started
Marpa, Python looked like a better choice in terms
of mindshare. In this post, I explain why the initial
Marpa implementation uses Perl.
This post has been very popular, even
being translated into Japanese.

Parsing timeline and history

In the process of writing Marpa, I read and thought a lot
about the various approaches to parsing,
and how they developed.
When I came to share my thoughts, it discovered quite a few people
were also interested -- the posts in this section are my
most popular.

Parsing: a timeline
A timeline of parsing research, describing how other algorithms pushed the parsers
in the Earley lineage
(like Marpa)
aside, only for things to come down twisted.
This is the most technical post I've ever done.
Much to my surprise,
its
first version
was a runaway hit.
Within its first two weeks it became my most popular post, by far.
The development of the various parsing algorithms is an interesting story,
one that has never really been told as a story before.
Readers apparently liked this.
[ Because of its length,
the current version of the "timeline"
is no longer a blog post --
it's a free-standing web page. ]

Parsing: top-down and bottom-up
If "Parsing: a timeline" was about "when and how?",
then this is about "why?"
It describes the concepts behind the two major traditional approaches to
parsing and confronts some questions:
Why did
the invention of bottom-up parsing create so much
enthusiasm?
Why, despite bottom-up's abandonment by practitioners,
does it still play such a big role in classrooms and in textbooks?
What does this say about the way forward?
Is there one?

Like the "timelines",
this is a popular post,
in this case mostly thanks to the Google bots --
as of this writing it comes up in high position on
organic searches.
If this post interests you,
you may also want to look at the blog post
Top-down parsing is guessing.

Parsers
and Useful Power
This post looks at the question of "What do users want in a parser" from a historical perspective.
In particular, it looks at the contest between the Irons parser and recursive descent.
The Irons parser is declarative and far more powerful than recursive descent.
In terms of parsing at linear speed, there were tradeoffs,
and in their pure forms, neither algorithm could parse the
all-important arithmetic expressions.

But recursive descent could incorporate custom
hacks,
and those custom hacks could handle arithmetic expressions.
Today recursive descent is everywhere,
while to find a mention of the Irons parser,
you need to make an exhaustive search of
the more thorough textbooks.

Why is
parsing considered solved?
"On what grounds would someone say that parsing is
'solved'? To understand this, we need to look at the
history of Parsing Theory.
In fact, we'll have to start
decades before computer Parsing Theory exists, with a now
nearly-extinct school of linguistics, and its desire to put
itself on strictly scientific basis.
This post is a timeline in itself and
a revised version of it will eventually
be incorporated into
the v3 timeline.

Marpa versus Perl 6 grammars and PEG

In these blog posts, I discuss Perl 6 grammars,
and compare them with Marpa.
Perl 6 grammars are a PEG variant.
I believe that Perl 6 grammars and PEG have a place when used
as a "better" regular expression engine.
But it can be shown that
users (and language designers) whose hopes go beyond
that are doomed to disappointment.
In particular,
if your interest is language-oriented programming,
PEG and Perl 6 grammars
lack the horsepower and the programmability
to advance the state of the art.

Grammar reuse
Language-oriented programming requires the ability to piece grammars
together from other grammars.
The Perl 6 syntax makes this convenient, but the underlying parse engine
does not adequately support it.

Top-down parsing is guessing
Perl 6 grammars have a top-down, PEG, parse engine.
In this post I describe at a very basic level what it means to parse top-down
and argue that the current successes of top-down cannot be taken as
evidence that the top-down approach can power language-oriented programming.

What are the reasonable computer languages
If you are doing language-oriented programming, what kinds of language should you
support?
A clue is provided by going back to the research of the 1960's and 1970's,
when the researchers took an 'anything is possible' approach to language design,
and asked for what they really wanted.
These researchers did not "know", as we think we do today,
that they were demanding the impossible.

PEG: ambiguity, precision and confusion
PEG is a new notation for a notorously tricky
algorithm that goes back to the earliest computers. In
its PEG form, this algorithm acquired an seductive new
interface, one that looks like the best of extended BNF
combined with the best of regular expressions. Looking at
a sample of it, you are tempted to imagine that writing
a parser has suddenly become a very straightforward
matter. Not so.

I'm a fan of the Perl 6 effort.
And I certainly should be their supporter,
after the many favors they've done for me and the Marpa community over the years.
The considerations of these post will disappoint the hopes for LOP
applications of the native Perl 6 parser.
But Perl 6 is much more than that.
It would be a mistake to read these posts as
part of a dismissal of the Perl 6 effort as
a whole.

What if languages were free?
In 1980, George Copeland wrote
an article titled "What if Mass Storage were Free?"
Afterwards, the cost of mass storage fell dramatically.
Copeland saw the first signs of this, and ran a thought experiment:
What might happen costs fell to the point where mass storage was, in effect, free?
This post runs a similar thought experiment on languages.
What might happen when the time and effort required to create a new language
falls toward zero?

Marpa allows things to be done which
could not be done before.
Many of these things are significant for language design --
they allow languages to take radically new and aggressive approaches
to their syntax.

Language
Design: exploiting ambiguity.
Actual ambiguity is a show-stopper.
But in the present state of the art,
even potential ambiguity is totally avoided in language design.
Which only makes sense, right?

Well, actually, no.
Our workhorse languages, the ones we humans use to communicate with each other,
are full of potential ambiguities, and allowing ambiguity makes those languages more
powerful and flexible. Could that same work for computer languages?
This post explores that possibility.

Evolvable
languages.
The parsers in heaviest use today (LALR and recursive descent) parse only
a very limited subset of what you can express in in BNF.
That's a barrier that a language starts bumping up against almost immediately,
and bumps up against more and more frequently as its adds features.

But Marpa lets you keep extending your language and (essentially) as long as you can
describe your language extension in BNF,
or in BNF plus some hacks, you can still
parse the language in linear time.
"Language design has been like shooting
crap in a casino that sets you up to win a lot of the first rolls
before the laws of probability grind you down.
Marpa changes the game."

Significant newlines?
Or semicolons?.
Currently a programming language designer must choose: either make whitespace semantically significant,
or use explicit punctuation to separate statements.
But Lua does without either, by keeping the syntax restricted and tiny.
And with the power of Marpa, you can also do without either, but at the same time have a lanuage which
is as large and as syntactically
rich as you want.

Reporting mismatched delimiters.
Delimiters are parentheses and brackets, but also quote marks and even HTML
and XML tags -- anything that explicitly marks the beginning and end of some text.
Delimiters are widely used,
easy to get wrong and,
in current practice,
annoying to fix, in large part because current parsers are not good at diagnosing
delimiter mismatches.
Worse, a mismatched delimiter will almost certainly make the rest of a compiler's
error message so unreliable that they are effectively useless.

Marpa allows more accurate reporting of delimiter mismatches than traditional
parsers,
and in most cases Marpa can recover and go on to accurately report other errors.
Read this blog post to find out how.
Hint: the Ruby Slippers get involved.

The Marpa-based HTML reformatter,
html_fmt,
is the Marpa application that I use the most.
In fact, as I am typing this HTML,
I am using it repeatedly as a filter,
much in the same way that you'd use
gnuindent
for C code or
perltidy
for Perl.

When I left off with the first series, I mentioned that an HTML parser really should
be configurable -- the user/application should be able to decide which variant of HTML
they want to parse.
The result is a parser driven by a configuration file that allows the user to reshape
HTML.
And, by the way, the configurable version is almost twice as fast.

Marpa v. Perl regexes: some numbers
pits Marpa directly against Perl regexes. Marpa will never beat regular expressions at their own game, but regular expressions are often taken out of their game, and in those cases the results can get interesting.

Marpa v. Perl regexes: a rematch
revisits the comparison of the earlier post.
In that benchmark, much of the advantage of Perl regexes came, not from faster parsing,
but from lower overhead.
This post contains a more direct comparison of the two parse engines.

Marpa v. Parse::RecDescent: some numbers.
Recursive descent is currently very popular.
This post takes an interesting example from another post,
and rewrites it from Parse::RecDescent into Marpa.
Marpa has equal time complexity, and its parse engine is written in heavily optimized C,
while Parse::RecDescent is pure Perl.
The result is what you'd expect.

Yacc and its derivatives are considered the standard in parser
generators. Yacc is part of a great series of traditions it was my
privilege to watch being formed, and I have the greatest respect
for its inventors. But to my mind yacc is one tradition more honored
in the breach than in the observance. In this series I describe
what will be required for a parsing algorithm to displace yacc's
LALR, and why I think the Marpa algorithm has what it takes and
more.
The first three requirements are described in my first post,
Killing Yacc: 1, 2 & 3
:

Requirement 1: Handle Arbitrary Context-Free Grammars in O(n3)

Requirement 2: Handle "Average" Grammars in Linear Time

Requirement 3: A Sound Theoretical Basis

Error reporting has long been overlooked, and it is something
at which yacc was astonishingly bad. I looked at the issue of
development-time error reporting in
Why the Bovicidal Rage?:

Both hand-written recursive descent and Marpa meet, to various
degrees, all five of the previous requirements. The last requirement,
stated in
Bovicide 6: The Final Requirement,
was the tie-breaker. Hand-written
recursive descent has many strengths, but it will never be available
as a library.

Requirement 6: Available as a Library

The "Perl and parsing" series

My "Perl and parsing" series used Perl to illustrate parsing theory
and parsing theory to investigate Perl. I am not one of those who
believes that mathematics takes place in vacuum -- it follows trends
and even what are more accurately called fashions. Therefore this
series has a lot of discussion of how current parsing theory came
to be.

Parsing with Ruby Slippers:
This is the post where I debuted the "Ruby Slippers" parsing technique, using both HTML and an example from Perl to illustrate it.

Parsing Perl 3: Perl and Minimalism:
I had intended a couple of posts about Perl & parsing. At this point I realized the topic was expanding on me. This post is about the influence of minimalism on computer languages and parsing theory. Larry Wall is perhaps the only language design who has managed to escape the influence of minimalism and that is the reason Perl is so different from other languages.

Minimalism: Philosophy, Tradition, or Lunacy?
An explanation of why the programmers of my generation and prior -- the ones who shaped the field -- are, almost without exception, stanch minimalists. Not numbered as part of the series, but it fits into the discussion of minimalism.

Perl and Parsing 4: The Golden Age of Right Parsing:
Having nibbled at philosophy, I now charge into a history of parsing theory. My approach is unusual. I treat parsing theory as what it is: a human endeavor. This post describes the rise of LR, or right parsing.

Perl and Parsing 5: Rewind:
If you require evidence that parsing theory can be as much about trends, ideology and fashion as it is about mathematics and techology, here it is. The dominant parsing algorithm for production compilers today is hand-written recursive descent, exactly the one that was displaced in 1975, when the technology was almost unimaginably less powerful.

Perl and Parsing 6: Error Handling:
A discussion of Perl error detection. Despite the
limits of its underlying LALR parse engine, the Perl
interpreter can seem psychic at times. I show how the
conjuring is done, and give some other examples where
the limits imposed by LALR parsing show up very
clearly.

Perl and Parsing 7: Do List Operators have Left/Right Precedence?:
The documents describe Perl's list operators in terms
of left- and right-precedence. Despite the fancy
names, these are not terms from parsing theory and
are not well-defined. In this post I show that,
whatever their meaning, it is wrong, and I propose
an alternative.

Perl and Parsing 9: "Use" and the Ruby Slippers:
While Marpa is the first parser to allow it in anything
close to its full power and generality, Ruby Slippers
parsing is not unprecedented. Perl parses its use
statement by dummying up tokens based on the grammar.
But in order to use the Ruby Slippers, so unhelpful
is LALR, Perl has to simulate the parse inside the
lexer.

Perl and Parsing 10: "Use" the Easier Way:
In this post I use Marpa to parse Perl's use statement.
Marpa is a lot better at the Ruby Slippers technique,
but here the problem can be solved even more simply
with another of Marpa's extended capabilities: ambiguous
tokens. Ambiguous tokens are tokens whose type is left
uncertain. It is left up to the parser to decide which
token to use, something which Marpa can do easily.

Perl and Parsing 11: Are all Perl programs parseable?.
The answer is no. Because I was the first to formally
prove it, I initially got the blame, and now get the
credit as the discoverer that Perl parsing is undecidable.
While a very interesting result, its significance can
be overstated. In particular, while undecidable parsing
has definite benefits for the Perl user, no downside
has ever been demonstrated.

Perl and Parsing 12: Beating up on the "use" statement.
I had spotted a gotcha in the parsing of the "use"
statement, which I saved up from from the previous
two posts. In the meantime chromatic noticed another.
Piling on is really not fair but, hey, can you really
blame me for unlucky timing?

The
syntax of English is undecidable.
Many of those most interested in Marpa are from the natural language processing (NLP) community.
In taking on natural language,
as in taking on any other computing task,
a reasonable first question is, "Is it decidable?".
This post claims to answer that question.

Parsing Ada Lovelace.
I wrote this post when I started exploring parsing ambiguity.
Following this line of inquiry eventually resulted in a new way of handling ambiguity in programming languages -- allowing
potential ambiguities in the grammar, but not allowing actual ambiguities.
(An actual ambiguity, when it occurs, is handled in much the same way as a syntax error.
And it can be fixed, for example, by adding parentheses, or whatever other punctuation eliminates
the ambiguity.)
My primary concern is programming language issues, not natural languages issues,
but I find going back and forth between the two very fruitful.

What parser do birds use?
Research shows the Japanese Great Tit, a bird whose brain
weighs less than 1/20 of an ounce,
uses a language that is beyond the capabilities
of parsers in standard use today.
I think future generations of programmers will be astonished at
the way in which the current generation limits itself.

Behind Marpa is a lot of mathematics,
a lot of it old, but also some new.
The posts below are for those curious about the Theory side of Marpa.
(The definitive source is
my paper on the theory behind Marpa, but that assumes the reader
is comfortable with the mathematics of Parsing Theory.)

Jay Earley's Idea:
A user-friendly description of Earley's algorithm.
Marpa's algorithm is much more complicated, but at
heart, Marpa is an Earley parser.

Is Earley parsing fast enough?:
In 1968, Earley's algortihm was considered too slow to be practical for production.
Now Earley's has improved and computers are over a billion times faster.
Maybe, after 45 years, the old rule of thumb on Earley parsing speed needs revisiting.

A warning to the reader:
The posts in this section are my most misunderstood,
and therefore, so far, can be called my least successful.
These are posts in which I outline future directions,
as plainly and simply as I know how to,
uncluttered by details.
The problem seems to be that, without technical details
and code examples,
many readers take my words
as pleasant speculations,
and not as I intend them,
as descriptions of things
that can be done now, today.

Parsing on your new hyper-quantum computer
Marpa is a non-deterministic parser, and those used to coping with recursive decsent are
used to thinking in terms of a single thread.
With non-determinism, you become free of all that backtracking and lookahead,
and your parser becomes simpler and more modular.
This requires a re-visioning of parsing, but it is very similar to the re-visioning
you'd need to do if you were suddenly delivered new non-deterministic hardware
and asked to program efficiently for it.

These tutorial were written before the SLIF interface was created.
Today, you'd want to use the SLIF in most of these cases.
But most of their contents are still relevant,
and most of the actual code would simply be carried over.

Precedence parsing made simpler.
Defining grammars in terms of precedence is an attractive technique.
General parsing makes it a lot simpler.
Here's a precedence parsing DSL where you don't have to define your
operators, much less describe them as infix, postfix, circumfix, etc.
Which means efficient precedence parsing can be conveniently extended to n-ary
operations, and can even allow implied operators.

A Marpa DSL tutorial: Error reporting made easy.
Your quickly written
domain-specific language can, as of its first draft,
have error reporting whose helpfulness and precision
exceeds that of carefully hand-crafted production
compilers. This tutorial shows how.

A Marpa tutorial: pattern searches.
We use regular expressions for pattern searching these days.
But what if your search target is not a regular expression?
This tutorial shows how to use Marpa to search text
files for arbitrary context-free expressions.

A Marpa tutorial: iterative parser development.
Unanchored search and error reporting are examples of the same general problem:
finding a desired pattern/language,
when it is embedded in other, aribitrary, text.
This approach can also be used as a test-driven method
for parser development.

You start with a small subset,
and to test how far you've succeeded,
you "search" for it in the text you are trying to parse.
As you add syntax, you "find" more and larger "targets" in your texts.
When all "searches" of valid texts
finds a single "target" which spans the entire text,
the language is finished.

This tutorial gives an example of one iteration in the development
cycle of a parser.
It takes the language of the previous post
(a small subset of Perl expressions)
and extends a bit.

What? No Lexer!:
A generation of programmers have grown up identifying
lexing with parsing. Marpa is not a lexer, and Marpa::XS
itself does not contain a lexer.
The tutorials above contain very nice lexers,
written in a few short lines of Perl.

This is Ocean of Awareness's
new home. This blog has been hosted at
blogs.perl.org
but I have succumbed to the lure of static blogging.
I have not yet decided how to deal with comments at this new blog location.
If the post is Marpa-related,
the Marpa mailing list
is a good place to comment.