Chris Clark (compres@world.std.com) writes:> Arch Robison presented a very good case for iterators to solve just> these problems at the 94 OOPSLA workshop of Object Oriented> Compilation techniques. His paper and others can be found at> ftp://ftp.parr-research.com/pub/oopsla.

Thanks for citation, Chris. Two more years of experience of using iterators
for traversing an IR has convinced me that iterators are the way to go.
I'll summarize some of the arguments here. By iterator, I mean
"iterator with cursor". That is, something that you can write
C-style loops in such as:

for( iterator i = root; !i; i++ )
inspect( *i )

even thought the traversal may be over a tree. I'll take "visitor"
to mean a procedure that apply's some visitor function to each node
of an IR. By "explicit recursive traversal", I mean writing this
sort of thing:

1. Reuse - explicit recursive traversal is not reusable. You have to rewrite
the same recursions each time. In a complicated IR, this can
be painful each time. Visitors win here too.
I suppose attribute grammars win even more since you don't have
to write traversal code. On ther other hand, the last time I wrote
attribute grammars, I remember writing "auxiliary" attributes to
write functionally what would have been trivial to write procedurally.
(This might be the "grass is greener on the other side" syndrome.
I can't remember the examples any more.)

2. Control over iteration order - with an iterator, this problem is trivially
solved by passing "mode flags" to the iterator's constructor.
I suppose that mode flags solve the order problem with visitors too.

3. Convenience - this is where visitors start to lose. With iterators,
I can write a triply nested loop without difficulty in a few lines.
With visitors, I have to write a bunch of functions, one for each
level of nesting. (Okay, if you have lambda abstractions in your
language, this is not a problem. But I have to write my stuff in C++.)
The result is that an optimizer can look more like good ole FORTRAN
and less like LISP :-)

4. Computational complexity - this is where recursive traversals and attribute
grammars lose. What is the computational complexity of a traversal?
If I write:

I know I have cubic complexity, assuming (amortized) constant time
for the iterator primitives. With a recursive traversal, I have to
study the recursions very carefully to figure out the complexity.
And in a real production compiler, these recursions are going to be
spread out over *pages* of code, not a few lines of code.
With attribute grammars, I probably have to have the attribute-grammar
compiler tell me how clever it's been in compiling my grammar.
With iterators, I simply look at nesting. In production compilers,
knowing the computational complexity is essential.

So why isn't everyone using iterators? There are some drawbacks:

5. You need a language that has enough polymorphism to let you conveniently
write and use a reusable iterator. Sure, you could do it in C, but the
attendant hassles from lack of polymorphism probably make them a pain.

6. Certain kinds of information passing between IR nodes is more suited
to recursive traversal. Visitors are probably even worse with regard
to these problems. (How do "visits" communicate between themselves?)
For this reason, we're not religious about iterators here. Our optimizer
uses both iterators and recursive traversal. But probably 4 out of 5
traversals use iterators for the simple reason that writing a for-loop is
much simpler than writing a recursion.

In some cases, iterators force using a different algorithm than would
have been used for explicit recursive traversal. To a pure theoretician,
picking an algorithm based on this difference might seem to be putting
the cart before the horse, but in a production optimizer it becomes
one of many engineering choices based on pragmatics.
And the iterators are definitely easier to maintain.
Explicit recursive traversals seem to involve adding more and more
parameters to the recursive function over time.

Changing topics here. As others have noted, there is a tension between
the OO and procedural organization of a compiler.
As Greg Morrisett put it:

> > What we need are languages that allow programmers to view the organization> > or their programs in different ways in order to support different kinds> > of modifications. Ideally, when adding a new kind of primitive to the> > AST, the system would present the OO version, so that all I have to> > add is one new definition, with appropriate methods. But when I go to> > add a new transformation or optimization, I'd rather see the "old-> > fashioned" view of my program where I code one big procedure that> > switches on the different kinds of AST objects. This way, I don't have> > to touch all of the classes to add new methods.

I find that with experience, I can organize a compiler to have the better of
both worlds, though not the best. (I'd need the aforementioned multiview
language for that.) The trick is to write procedural transformations and
optimizations, but avoid having them operate on concrete AST nodes. For
example, instead of a pass operating on "add", "subtract", "multiply", and
"shift", nodes, it might operate on "referentially transparent nodes".
Building inclusion hierarchies of properties is useful -- they allow some
passes to paint with a broader brush. I.e., "no references to memory" is a
subproperty of "no writes to memory. (Don't need those pointy-headed OO
class hierarchies for this - look-up tables suffice.) So even though
transforms are written procedurally, adding a new kind of referentially
transparent node is not a problem. Finding the right set of abstract
properties takes time. There are some in our optimizer that are motivated
by no higher principle than "Gee, this property is needed here and here and
here!" Of course, to add a new kind of node with radically different
properties not anticipated in the original design requires much effort. But
such trouble is probably inescapable no matter how the compiler is
organized, because it probably invalidates fundamental assumptions of the
transforms.

This the sort of practical practice that gets one excommunicated by
the religious purists on both sides of the OOP wars.