On Thu, Jan 28, 2010 at 09:20:20AM -0600, Ira Baxter wrote:> Parser generators are only a small part of tools one would want for> compiler construction. Why people beleive that is a significant part> is a complete mystery to me, especially if they've had a compiler> class or attempted to build a compiler.>> Well, there are toolsets that are designed to support compiler> construction. The New Jersey Machine toolkit comes to mind. There's> a quite good list at http://catalog.compilertools.net/.

In fact every programming language is different and especially for
non-imperative programming languages there are several often very
different intermediate representations, abstract machines and
optimisation techniques, so it is more difficult to offer a
comprehensive compiler construction toolkit.

> > sadly, it does not take one long to fnd, if implementing a compiler> > for a non-trivial language (such as the main C-family languages),> > that the parser is not really the big source of complexity (but, at> > least with C and C++, it can be a big source of slowness, but this> > is not quite the same issue...).>> The *language* C++ is the "source of complexity". The problem of parsing> it has been killed dead several times. You can do it (clumsily) with> LALR parsers and tangling of symbol tables and parsing actions.> You can do it by recursive descent and similar tangling (GCC, I think EDG).> You can do it using GLR parsers, with *complete* separation> of parsing and symbol table collection (our DMS Software Reengineering> Toolkit does this), which means you can write a grammar that almost> mirrors the reference grammar.

You can very easily do it (thoug I never attempted to do so) with a
scannerless parser. The last time I looked at it found that most of
the complexity of parsing C++ source code comes from the artificial
separation of lexer and parser that has been the prevalent dogma of
the last four decades in compiler construction.

An excellent scannerless parser generator for boolean grammars
(superset of the class of context-free and subset of context-sensitive
grammars) is sbp [1].

> Further, AFAIK, GLR parsers aren't necessarily slow. (The DMS> parser isn't particularly fast, but then we capture comments and> source format information as we go). Adrian Johnstone and his> students have done a bunch of work on tuning GLR parsers. I think> it was Scot McPeak that implemented an optimization that makes GLR> parsers run like LALR(1) [in terms of computation effort] in the> highly frequent case that there are not multiple parses in a> particular part of the grammar. And Pennelo showed how to "compile"> LALR parsers to machine code for extremely fast parsing ("Very fast> LR parsing", ACM Sigplan July 1986). [We'll get around to composing> this set of ideas someday]. Once you do this, much of the front end> slowness is in lexer.

I suspect that the machine code generated by a parser generator is
faster than the code generated by gcc, so if you really want it to be
fast, you should compile to C. You also gain portability when doing
so.

A partial evaluator could perhaps also boost the performance
significantly.

You may also have a look at [2], it covers extensible and modular
compilers in chapter four.