Sat, 1 October 2011

This blog post is a mirror of a
post
on the Project Fortress Community Blog by Guy Steele. The original
posting is no longer available. I retrieved this copy of the text from
the Wayback
Machine.

Because the snapshot that the Internet Archive took of the
original post was made on December 6, 2009, the only comments that
appear are those that were made leading up to that date.

November 30, 2009: Why Object-Oriented Languages Need Tail Calls

[Note: The original blog post used the term “tail-recursion”, but in
response to feedback on December 1, I have replaced this term with
“tail calls” to emphasize that the issue has to do with multiple
objects calling each others’ methods, not just the limited case of an
object invoking its own methods or the even more limited case of a
single method invoking itself (whether on the same object or another
object). Also, some minor errors in the code examples have been
corrected as of December 3.]

We recommend an interesting essay from the recent ACM OOPSLA Conference:

Cook makes an important distinction between abstract data types and
object-oriented programming, and argues that they have related but
complementary characteristics.

In this blog post we extend one of his examples in order to make a
completely different point: object-oriented programming languages need
tail calls correctly implemented, not just as a “trivial and optional”
optimization of the use of stack space that could be achieved by using
iteration statements, but in order to preserve object-oriented
abstractions.

Following Cook, we begin by considering an abstraction of “sets of
integer values”. At a minimum, we would like to be able to (a) obtain
an empty set, (b) adjoin an integer to a set to create a new set, (c)
construct a set that is the union of two given sets, (d) inquire
whether a set is empty, and (e) inquire whether a set contains a
specified integer. We can describe this abstraction as a Fortress API:

(To facilitate comparison to the examples in Cook’s paper and also to
the way one might code these examples in the Java programming
language, we use dotted methods throughout these examples. A more
Fortress-like style would be to define operators for the “union” and
“contains” operations:

We use several different sorts of object to represent sets. Each
implements the same abstract interface. One of Cook’s points is that,
in a truly object-oriented style of programming, the implementation of
each object uses only the abstract interface to communicate with other
objects, allowing the implementations to be quite independent of each
other, and allowing new implementations to be added easily at any
time. For example, Cook shows that infinite sets can easily be
represented using the same abstract interface, and we can readily
generalize his example and translate it into Fortress code as follows:

Yes, we know lots of ways this code can be optimized, but let us
please focus on the behavior of the constructed set, which is a linear
chain of one million objects of type AdjoinObject. When we ask
whether the set s contains 13, the first AdjoinObject will check its
x value, see that is it not 13, and then pass the buck to the next
AdjoinObject. Observe that this “passing of the buck” is a tail
call; to see this, note that the short-circuit Boolean OR operator is
equivalent to the use of an if statement:

if the underlying implementation fails to implement tail calls without
unnecessary growth of stack, and also is unable to support nearly a
million nested stack frames, then execution of this query will fail.

Many programmers, including those familiar with the style of the Java
programming language, will object that the original design of the
implementation was wrong-headed: if we want iterative behavior, we
should use an iterative statement such as while rather than method
calls. This approach is easily captured in Fortress for these
AdjoinObject searches:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
contains(y: ZZ) = do
u: IntSet := self
(*) The next statement has an interesting idea but is erroneous.
while (NOT u.isEmpty) AND: (y =/= u.x) do u := u.s end
NOT u.isEmpty
end
end

But there is a problem here; our implementation uses not only
AdjoinObject instances but also instances of UnionObject and
IntegerMod. In order to maintain the iterative structure of this
implementation of the contains method, we will need to make it
handle these other kinds of objects as well. This requires the
implementation of AdjoinObject to be aware of the implementation
details of these other kinds of objects, violating our object-oriented
abstractions. Every time we want to invent a new implementation of
IntSet, we will also need to add a case to this while loop in
AdjoinObject. (In point of fact, the while statement shown above
contains static type errors. These could be remedied by using
typecase. Trying to satisfy the typecase without using an else
clause would then point up the need to handle all the other possible
implementations of IntSet.)

We could try to abstract the iterative process: If only our sets
supported Java-style iterators, we could just enumerate all the
elements of the set—oops, can’t do that, because some of our
represented sets are infinite. And Java-style iterators have problems
of their own, not least that they rely upon side effects and therefore
introduce synchronization issues in parallel code.

Even without the other kinds of objects, we can see that this latest
implementation of contains for AdjoinObject violates
object-oriented abstraction: the implementation inspects the field x
not only for the instance on which the contains method was invoked,
but the field x of other instances as well. Thus instances of
AdjoinObject are abstract with respect to “the outside world”, but
not with respect to each other. As Cook points out,
this is typical of an abstract data type: there is conceptually a
single implementation of the data type, and the code for the abstract
data type is privy to these details for purposes of accessing all
instances of the data type. This has its advantages; in particular, it
caters to certain kinds of optimizations.

In contrast, the object-oriented style makes every object fully
abstract, even to other instances of the same “kind” of object. This
also has its advantages; in particular, it easily supports multiple
implementations and code refactoring. If we want this latter sort of
abstraction and want to code iterations over chained objects in a
fully abstract and modular manner, then proper support of tail calls
is indispensable. This is even more important for the implementation
of transitions in an object-oriented finite-state machine of
indefinitely long lifetime.

For another example of the importance of tail calls, see Automata via
Macros
by Shriram Krishnamurthi.

Thank you, Guy, for emphasizing the importance of proper tail-call
implementation (let’s call it PITCH here) for software design and
refactoring. When I delivered my ECOOP
keynote
in 2004, I made the PITCH point slightly differently using the Design
Patterns
book. I developed some code using the patterns from the book in
Java. I requested and received confirmation that the code was truly OO
oriented and truly according to Their Book. Then I tested it and it
worked fine. Then I stress-tested it (on a not so large input), and it
failed with stack overflow.

The next step was to demo the same code in the PLT
Scheme class system, which looked very
similar to the Java code. And it ran great on those stress tests and
even larger ones. So I asked the 300-400 people in the audience what
the problem was and why this slow, interpreted language Scheme did it
right and Java, C#, Eiffel, Python, etc got it wrong.

Not one person had any insight so after a minute of playing with them,
I told them about the story of “Smalltalk to Actors to Scheme and how
PITCH had ‘come along for the ride’” (you may recall our email
exchange in the spring of 2004 to confirm the story of ‘apply’). And I
did reinforce how much knowledge the OO research community has
forgotten over the years.

Sadly, the reaction was surprisingly hostile from some corners in the
room. One person was especially outspoken in arguing that we all know
that we just need three or four special constructs – and they may
have to be map or for-each or such – and we’d be in business. Given
that it was a public exchange, it was difficult to point out to him
how messed up his argument was.

While I have received many positive emails about the talk (my slides
are on-line and people find them time and again), this message about
PITCH deserves to be repeated and repeated and repeated for OO
audiences. Let’s keep going at it.

gls – 2009-12-03 10:44

Yes, Faré, the third paragraph of your post is quite pertinent. I
especially liked this remark:

[I]n programs for which proper tail calls matters, the choice is
conspicuously not between proper tail calls and improper tail calls,
it is a choice between proper tail calls and explicit central
stateful loops. And the absence of debugging information is constant
when you transform tail calls into stateful loops.

Tim, recur is only for self-recursion, which is nice but doesn’t allow
for modular design.

And trampoline provide you full PTC at the cost of a (local) encoding
that must be followed by every single bit of software that you want to
use PTC with. Admittedly, though, you can probably statically
determine which pieces of software may be involved in higher-order or
otherwise modular tail recursion, and systematically encode those
parts of the software with trampolines. Kindof sucks, but works. Now
why should the user have to be doing manually what the computer could
automatically do a better job at?

In any case, if you want to use third-party libraries and they failed
to enforce that convention, you lose. Whereas if the use of
trampolines is to become universal – you may as well put PTC in the
language implementation and save users the hassle of dealing with
trampolines.

jyasskin – 2009-12-03 13:50

It’s really not convincing to show that a set implementation nobody
would ever use works better with TCO. Further, your “even more
Fortress-like style” with a union operator replacing adjoin isn’t
helped by TCO: build the set in the wrong order, and you still blow
out the stack.

The objection to TCO that you’re trying to answer is that it’s rarely
helpful in the real world. So you have to find real-world examples to
refute that. An impractically-slow set isn’t a real-world example.

gls – 2009-12-03 14:09

jyasskin, your point is well-taken, and I will even take it a step
further: it is even more in the spirit of Fortress to use data
structures such as relatively balanced trees that are amenable to
parallel traversal than to use long linear-chain structures that
require sequential traversal. To that end, a “real” implementation of
UNION would automatically rebalance the UnionObject tree. (See Jan
Maessen’s earlier posts in this blog about implementing treaps.)

On the other hand, I believe that the overheads of parallel processing
are inherently large enough that there will always be a place for
sequential processing, if only near the low-level leaves of the trees,
and I believe that there will almost certainly be a sequential aspect
to the highest level of any real-world program, if only “Wanna do it
again?” I believe that tail calls are important for sequential
programs that are organized according to certain principles of modular
structure.

I wish I could fit a real-world example onto a blog page. The best I
can do is to exhibit a “toy” example that illustrates the basic idea.

ccshan – 2009-12-03 16:56

jyasskin– Most operating systems make tail calls between user code
and kernel code. The context switch is usually implemented with a bit
of assembly code, precisely because C doesn’t have tail calls (but
machine code does). All these operating systems are real-world
examples of the necessity of tail calls.

rsc – 2009-12-03 22:07

ccshan– It’s certainly a thought-provoking analogy, but I don’t think
it withstands scrutiny.

The user → kernel transition is almost never a tail call: there are
results coming back from the kernel. The exception is the exit system
call, the final switch a process makes.

The kernel → user transition is a little like tail call, but more like
a return from a function call (not a tail return, just a return). The
exception is process creation, the first switch the kernel makes.

Regardless of how well the analogy works, OS context switches would
still be implemented in assembly even if C had tail calls, because C
doesn’t give you access to the instructions necessary to change the
processor privilege level and virtual memory state (but machine code
does). I wouldn’t use that as an example of the necessity of exposing
those very low-level hardware details, so I don’t think it makes sense
to use it as an example of the necessity of tail calls either.

metageek – 2009-12-04 07:07

gls wrote:

I wish I could fit a real-world example onto a blog page.

I bet you can find something in the graphics world to do it. For
example, given a tree of non-overlapping bounding volumes, find the
smallest volume that matches a given point. That involves walking down
the tree, asking each child, “does your bounding volume contain this
point?”; when one replies yes, you recurse into it. Since no more than
one child can say yes, that’s a tail call.

I think this is a plausible scenario from video games, used to find
the game-world object at a given point. However, I’m not sure how
often they actually do that, as opposed to finding the game-world
object that overlaps with a given volume (i.e., collision detection).

ggchappell – 2009-12-04 09:04

Guy, this is a thought-provoking article that makes an excellent
point.

On the other hand, I think that jyasskin’s point about UNION is also a
good one, and your response is not really adequate. Doesn’t the
proposed tree-rebalancing implementation of UNION have exactly the
same problem as the iterative implementation of ADJOIN? I.e., it
requires knowledge of the internal structure of other objects. So if
we added a new union-like object to the mix, then the new & old union
objects could not rebalance each other.

Perhaps we can summmarize all this by saying that TCO permits the use
of black-box abstractions when a property of an object depends on a
property of another object; however, TCO is not sufficient to handle
the case when a property of an object depends on properties of two
other objects.

Thus, your example neatly illustrates not only why TCO is a Good
Thing, but also its limitations.

Derek Elkins – 2009-12-04 09:30

I have two other examples
here. The
first example is perfectly reasonable code, though not common
(mainstream) object-oriented style, though arguably it should be. The
second is just a sketch but is a very pragmatic example. Hopefully,
this (Guy’s) blog post provides the explanation that my post left
implicit.

dasht – 2009-12-04 18:13

Just for kicks… and while I like the article I’m replying to… I
have an argument for the conclusion that gls is mistaken here - his
argument doesn’t (quite) hold up. It’s over on LtU:

“Steele’s argument bugs me. I don’t think it is quite right: We can
construct a Cooke style OO language that does not have TCO yet which
can solve the integer set problem Steele poses. [….]”

ccshan – 2009-12-04 20:02

rsc– I was talking about calls made to a continuation. Think about
how fork() and exit() work. It is because exit() is a tail call that
all my finished processes aren’t sitting around waiting for their
calls to exit() to return.

You only missed one (I'm the guy who mostly ran the site; we have live backups, including one on the laptop I use):

Faré -- 2009-12-07 01:56

As the devil's advocate, I'll give *valid* arguments why PTC might not be as important in practice as it is in theory:

1- Indeed with explicit trampolines, you can actually achieve PTC at the cost of merely an order of magnitude in performance and in syntactic complexity.

2- In current languages, programmers are already so burdened into low-level considerations that they don't pay a huge premium to do without PTC. Just like they used to do fine without GC, they can do fine without PTC.

3- As part of (manually) parallelizing algorithms, programmers already spend time refactoring their control and data structures to avoid deep towers of sequential computations, instead balancing things into shallow trees that can be readily parallelized.

4- If we eventually get more declarative languages then sure an SSC (sufficiently smart compiler) will handle PTC, but that might be the least (and least interesting) of its magic.

As to why the move (to java.net), it has a lot to do with standardizing infrastructure, increasing security, and sufficient maturity in the Kenai infrastructure to host our stuff. But it didn't do blogs the way we were accustomed to, and lacked the mighty-fine Fortress display macro (a Trac plugin that invoked emacs, latex, and ghostscript behind the scenes). And we're really really busy trying to get the compiler whipped into shape.