Sunday, October 22, 2006

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

We've just completed a major revision and simplification of the Closures for Java specification. Rather than post the specification on this blog, it is on its own web page, here. There are two versions of the specification: one with function types and one without. There is a pulldown menu at the top of the specification that makes it display only the text relevant to the version of the specification you want to see. Keeping the specification in this form will allow us to more easily maintain the two parallel specifications so we can compare them and delay a decision on this issue until later. These specifications are the starting point for our prototype.

There are two significant changes in this revision. First, there is a completely new syntax for closures and function types. Using the new syntax, with functon types, you can write

{int,int => int} plus = {int x, int y => x+y};

As you can see, we're proposing to add the new "arrow" token => to the syntax. Just to be perfectly clear, this code declares a variable of function type

{int,int => int}

which is a function that takes two ints and yields an int. The variable is named "plus" and it is assigned the closure value

{int x, int y => x+y}

which is a closure that receives two ints (named x and y) and yields their sum.

The second major change has to do with the treatment of "restricted" closures. We've done away with the "synchronized" parameters from the previous revision of the specification. Instead, you can inherit from a marker interface to restrict the closure conversion. If you don't use the marker interface, then closures are not restricted when converted to that type.

Another important change is to the meaning of a function type. It is now defined to be a system-provided interface type, and it is provided in a way that gives the required subtype relations among function types. That means that in order to invoke a value of function type, instead of simply placing arguments in parens after the function value, you use its "invoke" method. This significantly simplifies the name lookup rules for variables of function type. In fact, now there are no special rules at all.

As always, your feedback and ideas are welcome.

20 comments:

Previous version (0.2) was much better at my opinion. It seemed almost perfect.

Proposed syntax with => token looks strange, because it is not similar with current Java constructs. Everywhere in Java, arguments are in round braces (in method and constructor calls), and it's possible to keep this for closures. Also, I think it's much nicer to keep arguments ouside curly brackets. Also, your post about Tennent's Correspondence Principle for closures was great, and new specification throws this out. Please, revert syntax to 0.2.

Also I think that function types are not usable, because it's easy to define interface in each place where you need function type. Also function types make code less readable. Interfaces makes code self-documenting. Compare

{ T => U } (if we have new syntax)

with

interface Transformer<T, U> {U transform(U t);}

It's absolutely clear, what second type intended for, and it's not possible to find, what first does.

Note missing line. close() causes OutputStream to be flushed, and if flush() fails (because of network error, for example), user of original code won't see any error, but data won't be written on disk.

Examples should be error-free, because many people learn to code reading examples.

in my opinion, the intrinsic error is to catch the exception silently in the finally block. At least log it, but preferable, if closeAtEnd is assumed to be an API function it should throw IOException and leave the treatment to the client.

Neal,

generally I would agree with Stepan, that the syntax would be bit hard to get used to. I'd rather prefer round brackets for the arguments.

I actually wasn't too keen on the previous syntax for closures, using parentheses for the arguments: (int x){ x+2 }. The reason I didn't like it was that I found it difficult to visually parse, as it seemed to be two code fragments stuck together rather than a single unit. So in the middle of a big chunk of code, it wasn't always easy to pick out.

With the new syntax, the closure is all contained with braces: { int x => x + 2 } , which for me is easier to visually parse as a single unit, as it's always delimited by the surrounding braces.

Yes - it is different from current Java, but I think the syntax is fairly simple and understandable, and might be familiar from other languages. And for a new construct such as this, there's bound to be some tradeoff between ease of use and consistency with current syntax.

Re functional types.I'm not convinced yet one way or the other (presumably the spec writers aren't either...).

However, I don't think it's as bad as Stepan suggests. In an actual usage of a transformer object, you might have:

public void addTransformer({ T => U } transformer) { ...

So it's clear here what the usage is.

Personally, I think that functional types can be useful to define a quick function inline. Currently, it can be a bit of a pain to have to define a new interface every time you want to do this - it seems rather wasteful and arbitrary. You can then potentially end up with lots of interfaces defined in different places which have the same parameter types and return types; but they've all got to be defined separately, and they're all incompatible.

The downside can be similar to what Russel said.Also, if you end up passing this function object through your system quite a bit, then it's a) verbose to declare it and b) arguably Stepan's argument applies more here.

Question for Neal: would it be possible to define a class which implements a function type?For example, if you want a reusable transformer class, can you "implement" the function type { T => U } ?

I'd be strongly in favour of the functional specification. It looks more concise (e.g. in the closeAtEnd example) and, I think, would very much improve the expressivenesss of the language.I am involved in a large enterprise system and we use Java in a functional style for map and filter operations.

The new syntax of function types is much clearer than the previous one. This way one can quickly see that it is a block, with its arguments and what it returns...in my opinion a good choice, although I still prefer the use of interfaces only, because I think it fits better with the current Java language. On the other hand, the function types help to have a shorter notation, so I wouldn't decide myself on one choice or the other (whether using function types or not).

I must admit it's still hard for me to understand all the proposal, specially the use of the Unreachable type (maybe the formal mathematical way of writing the proposal is the reason, I'm not used to the compiler language notation).

nice.. looks like the proposal is becoming usable more and more. I was a bit surprised about the step to use "=>", but as a Groovy user and one of the people who said "yes" to the "->" syntax I am of course lucky to see that here.

As already mentioned you should think of going completely to "->" because "=>" looks much like "equal or bigger". Of course that is ">=" in Java since neither the types nor the "=>" is optional until now there is no real problem with that. But think of it. If you want to remove the types from the parameter list, then you might run into issues with ">=". Sadly the "=>" can't be removed completly in general, because it collides with the array initialization syntax.

Anyway, I am looking forward to the day being able to this:{int=>int} x = {long a => (int) a*2}

This is one among the class of problems that have been solved before. The question is how to transfer best parts of previous solutions to the java language without bringing along any serious flaws.

I guess what you want is A) to be able to send a "method parameter" to a method, and B) to have a way to declare the local code fragment that you want to send to the other method.

Simula-67 solved B) by allowing declaration of methods inside methods, just like classes can be defined anywhere: Any block-level named construct allows the declaration of any construct, limiting the visibility of the declaration to the enclosing block. A method descared inside a method is just as local as the variables.

Part A) was solved by allowing method parameters to methods: On the parameter list, something similar to an interface method declaration occurred. This introduced the name of the method-parameter in the scope of the defining method.

When calling a method that takes a method (closure) parameter, the name of the method was put on the arguments list, without any parametheses or similar. The callee can then call the passed closure as if it was a local method.

The beatuy of this solution is that there are no function pointers: No references to the closure can be passed out of the thread (or call-stack). Since the scope of the caller _must_ be on the stack when the closure executes, access to local non-final variables in the closure (inner method) is safe.

I suggest you look into this solution, having its elegance in simplicity. It will solve the common-case-closures problem in a way that does not introduce any "strange" syntax: it will only allow existing syntax in two new places. Passing closures out of the stack must still be solve as today, by creating objects, since these are the only things that you can keep a reference to.

Eirik: we have looked at this solution. Unfortunately, a solution along those lines fails to address many of the use cases for closures. For example, the approach can't be used for control abstraction because, among other reasons, the lexical binding of the "return" statement is not captured from the enclosing context.

and that is considered invalid syntax, because its effects usually are captured by the rules to select the correct method to be invoked. However with functional types you need to be able to express this wildcard, for the same reasons you need to do it with interfaces, as in the example at the beginning. The same would apply to extends-wildcards, but those can be worked around through type variables.

ralf: the spec already addresses this. a function type is covariant in its return type and contravariant in its argument types, so there is no need to use wildcards. In fact, a function type is an abbreviation for an interface type in which wildcards are used for the type parameters expressing the reference-typed arguments and returns.

I understand you do not want to get into scheme-style continuations. However, I don't like the name of the "RestrictedClosure" tag - even though the possibilities of a "RestrictedClosure" closure are "restricted", the receiver's usage of the closure actually becomes unrestricted.

At first I hoped for the "RestrictedClosure" tag to be related to stack discipline and escape analysis. As I realize now it's not directly related to optimization or implementation - even an untagged closure may be invoked out of scope. So each non-local break or return will need to be able to detect whether its original lexical scope still exists, unless the compiler is able to optimize this away.

I suppose non-local exits would have to be implemented using Exceptions? After all, a "break" statement might pop off several stack frames. Java code executing in these frames would still expect try/finally blocks to be executed. This might make break and friends too expensive to be used frequently in practice.

Have you thought about typing rules that enforce stack life times? (like VAR parameters in Pascal, or "yield" style iterators) Break and Return are stack-oriented anyway, and pretending that they are not would lead to continuations, which you are not willing to introduce; so the consistent thing to do might be special syntax and/or typing rules for blocks that can only be invoked or passed on to other block parameters. Certainly sufficient for loops.

About Me

Neal Gafter is a Computer Programming Language Designer, Amateur Scientist and Philosopher.
He works for Microsoft on the evolution of the .NET platform languages.
He also has been known to Kibbitz on the evolution of the Java language.
Neal was granted an OpenJDK Community Innovators' Challenge award for his design and
implementation of lambda expressions for Java.
He was previously a software engineer at Google working on Google Calendar, and a senior staff engineer at Sun Microsystems,
where he co-designed and implemented the Java language features in releases 1.4 through 5.0. Neal is coauthor of
Java Puzzlers: Traps, Pitfalls, and Corner Cases (Addison Wesley, 2005). He was a member of the C++ Standards
Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments.
He holds a Ph.D. in computer science from the University of Rochester.