Friday, February 15, 2013

Last time I talked about how the Scala compiler currently translates try/catch expressions used as arguments to method or constructor call into separate method definitions which in turn may require boxing of mutable variables. That's because an exception in the try would blow away the local stack of stuff needed to make the original method/constructor call. So I proposed a plan for the Scala compiler to translate

The Flaw

It was a great plan. A beautiful plan. It had only one flaw. Under some circumstances it would change the behavior of some constructors calls. To understand why we need to look at some JVM bytecode.

At the bytecode level, the constructor call

val foo = new Foo(expr1, expr2,...)

looks like(1)

new Foo
dup
// compute the result of expr1 and leave on stack
...
// compute the result of expr2 and leave on stack
...
invokespecial Foo.init
store foo

Which may need some explanation. The first instruction, "new Foo", creates a new, uninitialized Foo object but does not invoke its constructor. A reference to the uninitialized object is left on the top of the local stack. Then dup duplicates the top of the stack so now there are two references to the uninitialized object. Then a series of instructions computes the results of the arguments to the constructor, leaving each result at the top of the stack in turn (and pushing down the previous stuff). Finally, Foo's constructor is called which consumes all the constructor args and one reference to the Foo object. After the constructor is done, what's left behind on the stack is one reference to the now initialized Foo object. The store instruction consumes that and stuffs it into the foo variable.

If expr2 were a try/catch and my proposed transformation were to kick in, then that would look like

At first glance that would appear to solve the problem nicely: the order of effects between expr1, expr2, and Foo's constructor is all preserved. Right? Sigh.

You see, the bytecode instruction "new Foo" potentially has a visible side effect. That's because it can cause Foo's static initializers to fire if they haven't already. In the original, inline version, the static initializers would have their effect before expr1 and expr2. In the second version the static initializer might happen after. Here's a bit of Java code that demonstrates the difference.

I wrote that in Java because Scala doesn't have static initializers (the equivalent is done via constuctors on declared objects). Which might make you think "then who cares." But, Scala has to inter-operate with Java. So we need a plan.

Plan 1 (which won't work)

It's tempting to try to stuff the uninitialized object into a variable before dealing with the args.

That looks beautiful but, sadly, it is expressly forbidden by the JVM spec to have a ref to an uninitialized object in a local variable when a catch block is executed. The bytecode verifier will reject that approach outright.(2)

Plan 2 (which is annoying, quite probably slow, and may not work everywhere)

So we need to force class initialization early. One way way to do that is Class.forName. So now we have this

That's a lot of code coming out of one measly constructor call! To some extent that can be mitigated by creating a static helper method in the runtime library and calling that instead. But that doesn't entirely mitigate a performance concern. How fast is Class.forName? The answer will take some benchmarking but I'll bet right now it's a whole lot slower than code without it.

Even ignoring pefromance concerns Class.forName has a potentially fatal flaw: security managers might very well prevent it. So library A happens to trigger this transformation and company B can't use the library because of a policy decision about restricting Class.forName.

Plan 2b (which might help a bit with Plan 2's slowness but definitely won't work everywhere)

A solution related to Plan 3 is to use sun.misc.Unsafe.ensureClassInitialized. It's a much simpler interface than Class.forName and is likely closer to the underlying mechanism to do class initialization win the "new" instruction. The problem is that it's not portable to all JVMs and it's even more likely to be hidden away by a security manager.

Plan 3 (which is so crazy it just might work)

A realistic possibility, which is a bit crazy, is to create an uninitialized object then throw it away which would force static initialization but wouldn't call the constructor.

Which looks distinctly WEIRD but it's valid bytecode and gets the job done. The annoying factor with this one is that it can only be expressed in bytecode so either the whole transformation has to be done at a very low level or we would have to add a new node type to the Scala AST which is used solely to express "force static initialization of thus and such class" and then let the code gen phase translate that to a new/pop pair.

There's also a bit of concern that it's TOO weird. It's valid by the spec, but it's nothing javac would ever spit out so it's entirely possible that some JVM implementation somewhere would be very confused by it.

Plan 4 (which is scary, but has some merit)

The final possibility is to ignore the problem. Scalac could just reorder expr1 and expr2 to come before the new and leave it at that. It's not something that most Scala programmers would ever even notice. The very few it does affect can manually do whatever translation they want to get their static initializers to fire at the right time. Perhaps the -Xlint flag could issue a warning about static initializer order when the transformation takes place.

But then again, the reason for the re-ordering will be pretty non-obvious so the few it does affect will likely struggle with it for awhile, then decide Scala is just broken without ever seeing what's behind -Xlint.

What Now?

Hopefully this has been an illuminating romp into some of the innards of compiling for the JVM. The immediate plan is that Scala will continue to turn problematic try/catch expressions into method definitions which will occasionally mean a bit of extra boxing. In the grand scheme of things that perf hit is probably small enough and rare enough to keep this whole experiment at a low-ish priority. Still, when I have some time I'd like to go see what other JVM language with try/catch expressions are doing here. They might have figured something out that I haven't.

Footnotes

I'm using javap as a guide for how to disassemble bytecode. But I'm keeping variable names and class names and ignoring details about variable slots and the constant pool that aren't important for this discussion.

Why is it not valid? The simple answer is "because the spec says so." Specifically it says "There must never be an uninitialized class instance in a local variable in code protected by an exception handler." But the underlying reason for that restriction isn't entirely clear to me. JVM bytecode is statically typed. The JVM's type system is just very peculiar by the standards of mainstream statically typed languages. The type checking is (a part of) the bytecode verification process. In the verifier, every variable and stack slot gets one static type over a static region of the bytecode. In different regions those types are allowed to be different, but when flow through the two regions converges all the types must be consistent. Now, the verifier thinks of an uninitialized Foo as a distinct type from an initialized Foo. Calls to a constructor (init) are the magic that turns an uninitialized and thus unsafe to use Foo in one region into one that is initialized and thus safe to use in another region. Which is all good. Except for some reason the uninitialized to initialized transition has been special cased for exception handlers rather than treated as just another type-state change to a variable that must converge consistently. Perhaps there was concern over the runtime cost of a more complicated verification rule. Or perhaps there was some soundness hole they didn't think they could plug. I dunno. Whatever the cause I have to go to war with the army I have not the army I wish I had.