Wednesday, August 13, 2008

My Ideas for Java Closures

As someone who writes a lot of code using closure-like mechanisms ... in the form of lots of inline inner classes ... I have a few idea of what I want in a solution. I think I'm writing some powerful and elegant code today, but that elegance in function is undermined by some severe awkwardness in its expression as Java code.

It really comes down to conciseness. I can accomplish pretty much everything I need using inner classes and holder objects, such as AtomicInteger and friends. But it ends up being more code than I'd like.

What I want (to borrow Stu's term) is to emphasize the essence of my logic, and strip away the ceremony: the naming of the interface (it should be known from context), the types of parameters (just the names, please), the list of thrown exceptions, etc.

First of all, let's constrain the problem. Closures would be defined in terms of an interface, an interface that contains a single method. Attempting to use the concise syntax with an interface that contains multiple methods would simply be a compiler error.

Second, the closure block should have free read/write access to parameters and local variables in the enclosing method. This can easily be accomplished with syntactic sugar: the variables can be converted into references to holder objects, such as AtomicInteger, that are stored on the heap. Today, shared fields must be final, to indicate that it is safe to share references to the object between the main method and any inner classes.

The hard part about sharing local variables in this way isn't implementing the syntactic sugar, it's about the updating of the information provided to the debugger so that the debugger can undo the syntactic sugar changes, and make variables in the enclosing method, or in the closure block, appear to be local.

Lastly, syntax. I think Groovy has the right syntax here. The important part is for the compiler to actually help out rather than for it to complain from the side-lines. Today's Java compiler has all the type information, but just uses that to build fancy error messages about what you should have typed. It should be using that type information to avoid the necessity of all the extra typing (that is, keyboard entry, not the need for types in the Ruby/Groovy sense of the word).

When a closure is passed to another method, the parameter type determines all the compiler should need to know. Likewise, when a closure is assigned to a variable, the variable defines the closure interface. Thus:

The generic type of the list, Widget, informs the generic type of the Comparator. Thus w1 and w2 are fully typed, as instances of Widget. The return type of the closure is pegged as int (also from the Comparator interface).

Side note: I'd also like to see a lot of other streamlining of Java, such as
a Groovy-style implicit return.

In other words, the above could be expanded by the compiler to the following in terms of compilation:

See, the essence is still there, the comparison of the widget Ids. But its now occluded by all that ceremony about interface names, method names, return values, generic types, and so forth.

A caveat: there are edge cases where we'll need to identify the closure interface type.
This occurs when a method to be passed a closure is overridden. The compiler should be able to reduce the candidates based on parameter count, checked exceptions thrown inside the closure, and other factors ... but it may be necessary to implement an alternative syntax.
For example:

submit() is overridden to accept a Callable as well as a Runnable. Callable can
return a value. Again, minimal syntax here: the interface name followed by the
implementation of the closure method. This compares to the Groovy as keyword.

Now, it's easy to get carried away and put in stuff like Python style co-routines (the yield keyword), or want a syntax to allow the closure to force a return value from the enclosing block or method. <sarcasm> Yep, let's add a few more alien concepts to the language, people love that.
</sarcasm>.

These are also not function objects, so you can't easily do magic things such as currying (currying is a way of pre-supplying some of the parameters to a function, such that a new function is created that takes fewer parameters). An "interface" that's curried is a whole new interface and that's OK by me.

I'm more more modest. I don't particularly want to add new features to the Java language, just new syntax for the compiler, to let it do the ugly plumbing. That's kind of my theory for Tapestry's relationship to Java and the Servlet API as well, and it works.

A few side notes:

Annotations could be used to help the compiler out. For example, if a common interface should be a closure, but has multiple methods, an annotation could be used to tell the compiler which method of the interface is applicable to use as a closure; any other methods (in the inner class) would be no-ops or (perhaps guided by additional annotations) failures.

Annotations on fields and parameters (or perhaps on the method) could help the compiler decide how to share visible variables: do we need to use a thread-safe approach (such as something based on AtomicReference) or simple, non-synchronized holder objects? The compiler could do some escape analysis as well to chose the best solution, but in many cases, it's on the coder's shoulders to make the right decision.

From what I can tell, my ideas are closest to the Concise Instance Creation Expressions proposal by Bob Lee, Doug Lea, and Josh Bloch. However, in my opinion, the compiler can do much more in terms of providing type information. I think CICE stumbles in that it allows for creation of classes, not just interfaces, and gets bogged down in defining closure types and parameter types ... again, things that (with the mandate of single-method interfaces), the compiler can do autonomously.

Likewise, CICE requires that variables visible to the inner block be marked "public". This to me is something that the compiler can analyze; it can identify any assignments to variables and promote such variables to stored-on-the-heap status. I don't see the need for final; the compiler has plenty of ability to determine if a parameter or local variable is ever updated within the body of a method (and the body of any inner classes or closures of that method).

My basic concept is: Less is More. Support fewer cases but do so more cleanly, more concisely, and more understandably. Simplify. Let the compiler do more work. Reduce the density and complexity of the Java code. Expose the essence.

10 comments:

So, which of the proposals best fits your ideas? Well none is an exact match. CICE doesn't address the meaning of 'this' within the closure, but neither do you. FCM can be thought of as having two variants - a simple one where there are no function types and a complex one where there are. BGGA has a whole set of other features including non-local returns which you seem to be arguing against.

None of the main three proposals suggest dropping the type from the closure arguments. I suspect that is just too big a step for Java.

Overall, you've used a syntax like BGGA but with the 'return' semantics of FCM and CICE. Perhaps you could outline what 'this' should mean in a closure, and then we can pin down your thoughts...

BTW, the hard part about non-final local variables is that they are now potentially not thread-safe. How would you feel about having to use 'volatile' on local variables.

I still think what I'd prefer to use is closer to CICE than FCM. this would certainly be the closure object, as today this is the inner class; there is certainly a challenge.

One challenge is name conflicts: when a closure parameter name or local variable conflicts with a method (or intermediate closure) local variable, or an instance variable of the containing object.

That's a problem in inner classes today, with an awkward and infrequently used syntax to refine what "this" means. I think the best approach is to not worry about "this"; there are reasonable solutions for avoiding naming conflicts that might occur ... such as assigning this to a local variable visible to the closure.

As I mentioned; I think the default behavior when a shared variable can be updated is to share a AtomicReference. To reiterate: the compiler can detect this and fall back to simply copying the reference if it is never modified. Lastly, an annotation could direct the compiler to use a non-synchronized/non-volatile storage object when appropriate.

I think that as often as we can say "it's the same logic (as inner classes) with just fewer characters" the better off everyone will be. Inner classes are still something of a mystery to a surprising number of Java developers, and adding a new competing feature that introduces conflicting ideas is dangerous.

Additionally, I'm not saying that the type of a closure arguments should be "dropped", I'm simply saying it should not be reiterated from the method definition in the interface.

I'll be thinking about this some more ... just an idle exercise perhaps.

"It should be using that type information to avoid the necessity of all the extra typing"

Yes, but be careful. Sometimes the types of the parameters should be inferred from the interface that's being used, and sometimes the interface that's being used should be inferred from the types of the parameters instead. You probably want to allow either direction, or choose one, carefully.

"Now, it's easy to get carried away and put in stuff like Python style co-routines (the yield keyword)"

This seems to fit wonderfully in the closest language I know to Java - C#.

"These are also not function objects, so you can't easily do magic things such as currying"

Well, they could be function objects. BGGA defines function types, and you could write a generic curry method over them (though primitive types probably get in the way somewhat).

"My basic concept is: Less is More. Support fewer cases but do so more cleanly, more concisely, and more understandably."

Actually, the fewer cases you support, the less clean, less concise and less understandable your closures proposal ends up. Consider Scheme's lambda, it supports everything and it has syntax that takes less than 30 seconds to understand.

I'm very happy to see that I'm not the only one thinking that the current closure proposals are over the top. I'm perfectly content with the use of anonymous inner classes, and a more concise syntax for the special case where there's only one method to implement would go a very long way in increasing the expressive power of Java.

Especially non-local returns is something that I find very un-Java like with only very limited usefulness. The main reason Java is so widely used is that it limits the amount of ways things can be accomplished without reducing the expressive power of the language. This makes it easier to read and understand Java code and constructs because there are only a few ways to express most concepts.

Good and interesting post, but why are you so sure that interface implementation is the best abstraction for closure instances? It seems very much like old habits from passing Runnables and ActionListeners around.

What you are trying to achieve with passing a closure to a method is to pass a referable and invokable piece of program code (instruction sequence) that the invoked method should apply to data elements fabricated in its inner workings.

There is already a construct in Java for labeling instructions sequences and invoke those sequences from various places: methods. It is even type safe. How about allowing method declarations as parameters to a method. That would be a "single-method interface", wouldn't it (except it is questionable if its an interface or an object)?

Before Java, we used Simula on the university I once went to. The concepts in those languages are pretty much the same (except simula looking 30 years older and with algol-like syntax). In addition to Java's "anywhere you can declare something you can declare a class (or interface)" and "anywhere you can pass something you can pass class or interface instances", simula had the same rules for methods: inner methods, typesafe method parameters. That pretty much solved all the everyday requirements I've seen for closures, with a few, mostly esoteric, exceptions:

I see there is a need for nameless methods: curly brackets around some instructions written where you invoke the method. I guess this is to methods like anonymous classes are to other classes.

I also see that there is no need to specify parameter types when they can be inferred.

Over all I think you pinpoint what should be the cornerstone of the closure discussion: It's all about making things you already can do a little more convenient. Sometimes, it seems like people are more concerned with maximizing the number of new language features added under the closure umbrella. That should not be the focus.

Also I think JodaStephen is addressing the most difficult aspect of all: what is supposed to happen when you pass a closure to a different thread?

Wouldn't introducing type inference just for closures make the Java syntax as a whole less consistant?

Second, one glaringly obvious pain point the closure proposal should address (for me at least) that you explicitly exclude is the highly tedious plumbing required to do Swing event handling - any thoughts on how this would tie in?