July 29, 2003

Why go with CPS for Parrot?

This came up on the perl6-internals list a while ago. I ought to reply there, so I will, but I figure the reasons might be useful to other people, so I'm going to put it up here as well.

If you read back through the blog, it's pretty clear that while I like CPS, I wasn't convinced that CPS was the way to go, but if you look at the parrot code now, we use CPS. Why?

First, it's important to know why I wasn't going to go with CPS. That's pretty straightforward--fear and uncertainty. I was uncertain they were sufficiently useful, and afraid that if we did go with them we'd end up with an engine that nobody would create code for and nobody'd be able to work on because we were using a scheme (pardon the pun) that noone understood. I don't mind pieces of something that only a dozen or two folks at any one time feel comfortable working on--you don't need that many people poking at the internals of the garbage collector--but we need to make sure that things are at intelligible to at least some folks.

The uncertainty about their utility has been dealt with. Partly because of some of the needs that parrot has, and I'll talk about that a bit later, and partly because I've just gotten more comfortable with them. I had a fair amount of lingering discomfort with them that needed dealing with. The fear about scaring people off with them is gone as well--I understand them, I know how to explain them to people quickly and reasonably simply, and in a way that can deal with the knee-jerk fear often associated with them. (I'm convinced at this point that all of the fear people have about continuations is a direct result of how they're taught and what's associated with them, though that's a rant for another day.

The reasons to use them fall out of two needs--the need for optimizations, and the amount of context that needs to be preserved across function calls.

Optimizations are easy--having a CPS makes doing tail calls dead-simple. We can't always do tail calls, of course--they can have some issues with destruction timing guarantees and while we don't explicitly make those, it's not nice to surprise people when they haven't asked for it. Still, it's there for the using, and it means the difference between a deeply recursive algorithm dying a horrible death and running to completion. Besides, in a managed environment (and we are managed--while everything will ultimately be open to inspection there'll be some limits to what you can actually change) it can be quite difficult to implement tail calls without engine help. It's trivial to provide it, difficult to make it happen if we don't, and useful for the optimizer. Not a tough choice.

Calling context is another big one. We have gobs of stacks, registers, context, and whatnot that need preserving, putting us far past the "stick the return value on the stack and branch off to wherever" style of calling code. Not only is there this stuff that's required at the user level, the interpreter has a lot of stuff it potentially needs to save--what runloop you had, what your security context was, what flags were in effect, and so on. What we were going to do, pre-CPS, was, in the caller:

saveallcallrestoreall

and the callee would do a:

return

The call would push all the interpreter info onto the call stack, along with the return point. Return uses that information to put things back on return. With a CPS style, that turns into:

saveallcallccrestoreall

with the callee doing

invoke Px

with Px being the register holding the return continuation in it. Generally P1, but that's not required. (In both cases, the sub/method to be invoked has to be in P0 so no parameters are needed for the call) You'll note two differences: We spell return "invoke" in the CPS version and give it a parameter, and give call the "cc" suffix. Ooohhh, scary!

We could, of course, make the state saving explicit, rather than rolling it up with the call in the first example, but since we must save the state in all cases, there's little point in doing that. Mandatory state saving is fine, making it mandatory but not automatically doing it is silly. Someone'll forget and things will go bang, or we'll get odd security holes.

Interestingly, it also means that if you squint a bit and don't think too hard about it (and generally you won't) the only difference between a CPS scheme and the old scheme is we pass in the "return object" as a parameter, rather than put it on the stack. We could even rename the ops if we so chose, though the old jsr/ret style of calling is being preserved for a variety of reasons, so we're not going to.

Security is something of a concern--since a sub/method boundary is also a privilege boundary (which is to say that code on one side of a sub may have more or fewer privs than the other side) all the state saving, restoring, and respecting has to be handled by the interpreter. That's a good reason to not even give the option of managing interpreter state manually, since that's dangerous. It may be possible with sufficient bytecode verification and proper security design, but that's a lot more work than just managing it all ourselves and not giving user code the opportunity to screw it up. (That, after all, is the job of the interpreter writers! :)

Going CPS isn't entirely without extra effort, but its effort that's hidden under the interpreter covers, where it ought to stay. Compiler writers and people generating assembly by hand don't have to deal with it, as they ought not have to.

Without CPS, you have to know how to compile both calls and returns. With CPS, you only have to compile calls. It simplifies the implementation and opens up possibilities for optimization. For example, all the optimizations you might have performed on calls can now be applied to returns as well. And a CPS foundation should help with implementing things like coroutines, generators, and exceptions -- making their implementation much simpler, easier to test, and exposing them to optimizations.

In a way CPS reminds me of RISC and register machines. In the stack based machines, you have a strict order in which you perform operations. With registers, you have to make things a bit more explicit, but it means you have a simpler system. In CISC, you implement common operations (like call/return) as instructions. In RISC, you convert complex operations into simple ones (call becomes push + jump). Simpler implementations can be faster, more easily optimized, and more flexible. Once you've converted to CPS, then a "function call" is simply a jump. It's no longer an expensive thing, because you don't need to figure out call stacks, pushing/restoring registers, etc. Simple is good :)