Friday, August 25, 2006

Some people have been reacting to this proposal as if it
represents Sun's plans for the future of the language. At this
stage it is just some ideas being drawn up to address a number of
requirements in a clean and uniform framework. Other people are
working on alternative proposals, and I expect you'll see those
soon. It is too early to place bets on what, if anything, Sun
will do in this space for JDK7.

With the insights described in my last blog entry, we no longer
need two separate syntactic forms for closures. We also don't
need the "labelled" nonlocal return statement. We're updating the
draft spec with this and other changes, making it more precise,
and adding a bit of rationale. I hope to augment the brief
rationale that will appear in the draft with further explanation
here.

Closures, Inner Classes, and Concurrency

Our "solution" to concurrency issues in the original version of
the proposal was quite unsatisfying. Part of the problem was that
we could see no way to satisfy the requirements of that community
while preserving some of the most useful features of closures and
without violating the language design principles. Since then,
we've been reexamining the issue, and I think we've found a very
tidy solution.

Asynchronous use cases

To understand the solution, it is useful to characterize the
use cases for closures into two very distinct categories. The
first category, which I'll call asynchronous, are those
in which the user-privided code is not called immediately by the
API, but rather is saved to be executed from another thread or at
a later time. This includes asynchronous notification frameworks,
the concurrency framework's executors, and timer tasks. Generally
speaking these fall into a pattern in which the caller of the API
controls what will be done, but the API controls
when it will be done. These kinds of Java APIs are
widespread in practice, and are often expressed using
single-method interfaces or abstract classes. Because the code
passed into the API is invoked when its context no longer exists,
it is generally inappropriate to allow nonlocal control-flow.
Because it may be invoked from another thread, it is often
inappropriate to allow such code to access mutable local state.

The closure proposal attempts to address this category of use
case by providing the closure conversion, which converts
the user-written chunk of code into an anonymous inner class.

To make this very concrete, let's see how this would affect
they way you can submit a task to a java.util.concurrent.Executor.
The way you'd write it today (and this way of writing it will
always be available) is this:

This is a little better, though not necessarily enough by
itself to justify closures. This works because although
Executor.execute takes a Runnable as an
argument, and we wrote a closure, the closure conversion
converts it to a Runnable. Essentially, the closure
conversion builds an anonymous instance of Runnable whose
run method just invokes the closure. The closure
conversion is implemented at compile-time, generating exactly the
same code as in the first case, so there is no runtime overhead.

If we adopt an alternative invocation syntax that we're working
on to allow an abbreviated form of invocation statement for
methods involving closures (we're still working on the
specification), you would be able to write it something like this:

If this syntax doesn't mean anything to you, just imagine that
the block appearing at the end of the invocation of
ex.execute is turned into a closure and then passed as
the last argment (the only one, in this case) to execute.
That's basically what the abbreviated invocation syntax does.

This is a significant improvement compared to what you have to
write today. You could think of these uses cases as a shorthand
for creating anonymous class instances. The analogy isn't exact,
as certain details necessarily differ to get the langauge
construct to be well defined. Most importantly, the binding of
the names inside the block of code have to be based on the scope
in which the code appears, which in this case is the method
sayHelloInAnotherThread, not in the scope of the
anonymous class that will be created. Why? Well, if the meaning
of a name appearing in a block of code changes when you use it in
code passed in to method like this, then it would be a fruitful
source of Java
Programming Puzzles. That is especially true if the
scope from which the names are inherited isn't even mentioned in
the code. It gets worse when you consider that the method name to
which the closure is being passed might be overloaded, so you
can't tell when you're typechecking the block which interface you
are building.

Some people believe that what you can do with interfaces and
classes is enough, and would prefer to see no more (and possibly
less) than some abbreviated syntax for writing anonymous class
creation expressions. The most significant difference between
that approach and closures appears in the other category of use
cases.

Synchronous use cases

The other category I'll call the synchronous use
cases, in which the closures you pass to an API element are
invoked by your own thread strictly before that API element
returns control to you. There aren't many of these APIs appearing
in the JDK, usually because such APIs are rather awkward to write.
Anonymous class instances sometimes satisfy the requirement for
this category of use cases, and when they don't programmers resort
to various contortions to get their job done, or simply leave the
job half done. Let's see how far we can get, though, using only
interfaces for this category. I'll give a single example, again
motivated by the concurrency framework.

The Java synchronized statement is convenient and easy
to use, but for many purposes has been superceded by the
Lock interface in the concurrency framework. Here is how
you currently use a Lock to synchronize a snippet of
code:

Let's try to write this using only interfaces and classes, and
perhaps taking advantage of the syntactic abbreviation that we might
think of as being implemented by translation into inner classes.
The first thing we'll need is an interface that represents the
block of code that we're passing in. java.lang.Runnable
might just do it. So our first try at writing this method might
look something like this:

It would be nice if this just worked, but it doesn't quite.
The reason is that the run method of Runnable
isn't declared to throw any exception types, and specifically it
doesn't throw CheckedException. If we make something
almost like Runnable but in which the run method throws
Exception, that doesn't quite work either because the
implementation of withLock would have to declare that it
throws Exception too, and we aren't catching it in
callBigHonkingWhileHoldingLock.

The immediate problem is that the withLock method
doesn't provide exception transparency, which means that
the method throws the same exception type as the method passed in
to it. In order to get that, we'd have to make a version of
Runnable that can also throw some unspecified exception
type. We can do that with generics:

Assuming the closure conversion can automatically figure out
that it should be creating an instance of
RunnableWithException<CheckedException>, which it
should, this works just fine.

Things are a bit more complicated when the thing you're running
can throw two checked exception types; you really want to write a
version of withLock that doesn't care how many exceptions are
thrown in the block, they are all just propogated out. Let's take
it as a given that the generic type system can be extended to do
that (I don't think it's too difficult). Now we have a version of
withLock that is completely exception-transparent, and
can be invoked on a block of code that throws any set of exception
types, which will have to be declared or caught by the caller of
withLock. So far so good.

Let's take another slightly more complex example of a method
that we might want to refactor to use the new locks:

But this won't compile. The first problem is that the variable
c isn't final, but it is being accessed within an inner
class. We can easily fix that in this case by making c
final because the variable isn't assigned anywhere. If the
variable were assigned somewhere (but not in the block), we could
create a final local variable to hold a copy of the variable's
value where it is used, and use that other variable inside the
block.

There is a worse problem: the return true statement
inside the block is returning from the
RunnableWithException.run method, not
containsFred but the run method returns
void. We could solve that, perhaps, by making a new
version of Runnable-like interface that returns a
boolean. Or better yet perhaps we could add another generic
type parameter to indicate the return type of the interface's
method, and have the withLock method return the value to
its caller that was returned by the block.

So far so good... well, not quite as good as one would hope,
because the refactoring isn't straightforward and requires serious
thought from the programmer. Did you notice that the final return
statement had to move inside the block?
But Java programmers generally
speaking are smart people, and they can work out how to do this.
I'm sure they enjoy (as much as I do) spending their time puzzling
out how to coerce the language into doing what they want.

Performing a straightforward refactoring of this code to use a
method like withLock runs into a number of problems.
Let's look at the snippet of code that would become the body of
the method RunnableWithExceptionAndReturn.run

This can't possibly be the body of any method: it returns a
value in the middle of a loop, but it also falls off the end of
its execution. In order to refactor this method to use something
like withLock, we have to start reorganizing the code to
avoid these problems. We could move reportMissingFred(c);
into the synchronized block, but that changes the behavior of the
program. Another idea is to use a boolean state
value that is returned to the caller in addition to the count, to
tell the caller whether or not the block fell off the end, but
where would we put it? We can't make it a local variable in
countBeforeFred because the block can only use final
variables. Perhaps we could use a ThreadLocal. We could
put a length-one final boolean array in the local scope of
countBeforeFred, and keep the boolean value in element
zero.

If the body of the countBeforeFred were much more
complicated in its control structure at all, for example
containing a break from inside the block to outside it, this kind
of refactoring might become prohibitively complex. Realistically,
we would just give up and stop trying to use withLock at
all, effectively inling it. After all, it is only a few lines of
locking code, and it's an idiom that programmers should be
expected to learn, right? The idiom is even documented in the
javadoc of the Lock interface.

We've only touched the surface, though. The method
withLock is perhaps simple enough that you won't mind
having to inline it except in the simplest of circumstances when
it does you the least good. You might feel some discomfort at
repeating code, especially such boilerplate code, but there really
isn't any good alternative, and after all it really isn't that
much code.

However, some synchronous APIs are naturally more complex, and
it doesn't take much more complexity in the API before it simply
doesn't make sense for the caller to repeat code from the
implementation of the API to avoid these problems. Consider, for
example, an API that automatically closes your streams for you.
Instead of writing

You would write an invocation of a library method closeAtEnd to
get the same effect:

{
closeAtEnd(Stream s : openStream()) {
yourCodeHere();
}
}

You would probably go to much greater lengths to avoid inlining
the API method closeAtEnd than withLock, but the
same problems are are unavoidable. These are just two specific
examples of particular APIs of the synchronous closure variety; if
the API implementation is much more complicated, or contains
details in its implementation that callers should not be aware of,
the problem becomes acute because by-hand inlining is no longer an
option. If you've ever used
java.security.AccessController.doPrivileged, for example,
you know the pain.

While we've shown how to achieve exception transparency, we
haven't quite achieved control transparency, which means
that control constructs like break, continue,
and return have the same meaning when enclosed by a
closure. The idea is that you should be able to wrap a block of
code in an invocation of withLock, closeAtEnd,
or any other API that takes a block of code, and the meaning of
the control constructs appearing in that code should be the same.
Ideally, the only difference is that the block is executed while a
lock is held, or a stream is closed after the block, or more
generally whatever is specified and implemented by the API. You
should be able to break from the block to get out of a
loop or switch statement that might enclose the invocation of
withLock or closeAtEnd. We've looked at
examples where return is problematic, but the same
problems occur for other control constructs. Similarly, and for
the same reasons, in synchronous APIs you want to be able to refer
to and modify local variables in the enclosing scope; because the
block is executed synchronously and by the same thread, there are
no more concurrency issues than there are for an ordinary block
statement.

If you believe that these two methods - closeAtEnd and
withLock - are the only synchronous APIs worth having,
now or ever, you might be tempted to add some very specific
language features to Java to get precisely the desired
functionality, and dismiss the rest of our closure proposal. We
believe that would be a mistake, because we believe that
synchronous closures would greatly simplify the lives of
programmers. There are not many APIs of this sort today, mainly
because they can't be made to work very well with the language in
its current state.

The problem to be solved for synchronous closures is that the
programmer should not have to transform or contort his code at all
in order to wrap it in a closure to pass it to one of these APIs.

Bridging the Chasm

We're left with two seemingly irreconcilable sets of
requirements. Asynchronous closures should not be allowed to
refer to local variables in the enclosing scope, do not support
nonlocal control-flow, do not require exception transparency, and
generally speaking are done today using interfaces or abstract
classes. Synchronous closures, on the other hand, should be
allowed to refer to locals in enclosing scopes, should support
nonlocal control-flow, and may require exception transparency, but
generally speaking don't appear in the Java APIs because they are
not expressible. These use cases would seem to have so little in
common that one might not expect a single language construct to
satisfy both sets of requirements. But we believe a small
modification to the currently published proposal, in addition to the
change described in my previous blog post, satisfies both
sets of requirements. This is a very tidy result, because having
a single uniform language feature is very much preferable to
having two.

The change is very simple, but the change is not to the
specification of function types or closure expressions, it is to
the specification of the closure conversion, which allows
you to write a closure - a block of code - where a one-method
interface is expected. We add the additional restrictions that a
closure to which the closure conversion is applied must not
reference any non-final local variables from enclosing scopes, and must not use
break, continue, or return to transfer
control out of the closure, or the program is in error. These are the same
restrictions you have now with anonymous inner classes.

How does this solve both sets of problems? Well, first
consider the asynchronous use cases. These are expressed using
interfaces, and therefore in order to use these APIs with the
closure syntax, the block must be subjected to the closure
conversion. Consequently, if there is any use of a nonlocal
non-final variable, the compiler must reject the program.
Similarly, the compiler must reject the program if there is
nonlocal control-flow. The things that you will be able to do are
essentially the same as what you can do today, but with a much
simpler syntax and clearer semantics.

However, APIs for the synchronous use cases have largely not
been written yet, so they can be written using the new function
types. A closure is an expression of function type, so
the closure conversion is not required, and its restrictions do
not apply. Consequently, the block is allowed to use nonlocal
non-final variables, and can use nonlocal control-flow. The block
has essentially the same meaning as it would if it were not
wrapped by a closure. In the rare cases that an existing API is
expressed using an interface but is a synchronous use case - for
example java.security.PrivilegedExceptionAction - the
library writer can write a small utility method that converts a
closure to an implementation of the interface, enabling the client
to take advantage of the full generality of the feature.
To implement withLock using closures, with full exception- and
control-transparency, the JDK authors could write this:

Once you have support for synchronous closures, however, there
are many opportunities for the addition of extremely useful APIs
to the JDK as library methods, such as withLock,
closeAtEnd, and a method that simplifies the use of
java.security.AccessController.doPrivileged. More
importantly, synchronous closures enable Java programmers - not
the JDK authors, but ordinary Java programmers - to freely refactor
and abstract over pieces of their code in simple ways that are not
currently possible for seemingly arbitrary reasons.

Refactoring

This isn't just about writing and reading programs, it is about
refactoring programs too. Your ability to factor common code by
moving it into a separate "routine" is currently limited by how
the block passed by the caller can interact with the
caller. Closures, and synchronous closures in particular, simply
remove that limit. When a program is in need of a small
refactoring, you are free to factor just the code in common,
rather than revisiting the design of the basic data structures and
organization of your application. That was the point of my post What's
the Point of Closures. Some people missed the point, and
thought the program was merely flawed from the beginning and
should have been rewritten. Yes, it was flawed, as are most
programs in the real world. Not every programmer can get
everything right the first time, or can rewrite thousands of lines
to solve a localized issue. The whole point of such examples of
refactoring is that the original program is flawed in some
way. If the program were not flawed then refactoring might
not be required. The refactorings supported by closures enable the
programmer to get everything right in the end by small refinements
to the program, step by step.

And more...

I believe that the additional expressiveness of the language
with closures will enable people to discover and use new and
powerful kinds of design patterns to organize their code in ways
not possible today, making things simple that currently seem
complex. I have some more ideas in this direction, which I'll
write about at another time when I've had more sleep.

Wednesday, August 23, 2006

I just had a very productive meeting with coauthors
of the closure proposal. I think we made a lot of progress
addressing most of the specific concerns people have with the
proposal (with the exception of subjective comments like "I don't
think you should do this kind of thing to Java").

One issue in particular has been nagging us, and in spite of
Gilad's insistence that we comply with Tennent's
Correspondence and Abstraction Principles (described in
detail in the now out-of-print book Principles of Programming
Languages by R.D.Tennent), Gilad reluctantly admitted before
we published the first draft that there doesn't seem to be a way
of doing that for returning from a closure without introducing
inconsistencies with the rest of the language. In the end we
discovered that there is indeed a way to do it, and it seems to be
rather cleaner than any of the alternatives.

Setting aside the technical details of Tennent's approach,
the principle applies directly to the closure construct. A
closure can be used to abstract over an expression or a
statement. The principle dictates that an expression or
statement, when wrapped in a closure and then immediately invoked,
ought to have the same meaning as it did before being wrapped in a
closure. Any change in semantics when wrapping code in a closure
is likely a flaw in the language.

To put this more concretely, an expression

expr

should have the same meaning as the expression that wraps it in a closure and then evaluates it:

(():expr}()

And a block statement

{ stmt; ... }

ought to have the same meaning as creating and invoking a closure that wraps the statement:

((){ stmt; ... })()

Tennent's principles are very powerful because violations of
them tend to show up in the language as flaws, irregularities,
unnecessary restrictions, unexpected interactions or
complications, and so on. While exploring the language design
space for closures, every time we were tempted to violate them we
found that there were very real and unpleasant consequences; they
are very powerful tools for detecting bad smells in a programming
language design. The principles apply no less when extending an
existing programming language, because what you are doing in that
case is designing the next version of a language. We discovered
that when a bad smell is detected by these principles, it is often
straightforward to illustrate specific problems. The problems we
identified also arise in many supposedly "simpler" proposals that
attempt to provide merely a shorthand notation for construcing
anonymous class instances. I'll show two specific issues that we
identified based on bad smells, examples of the real problems
caused, and how we solved them.

The meaning of this

Before we look at the question of how to return from a closure,
and to help understand how the principles are applied, let's look
at the application of Tennent's principles to the question of the
meaning of this within a closure. One possibility would
be that it designates the closure object itself. That might be
useful if you want to define a recursive closure, as you now have
a convenient way of invoking yourself. An alternative is that it
designates the anonymous class instance that is created if the
closure conversion is applied. That seems natural if you
like to think of the closure expression as a shorthand for
creating an anonymous class.

Unfortunately, both of these alternatives get you into trouble,
and we can show specific ways that the language no longer fits
together when you take these approaches. The former approach, for
example, doesn't allow you to analyze the types of expressions
appearing within the closure, because the closure type depends on
the type of the returned expression, but the returned expression
might use "this", and so may depend on the type of the closure.
The latter approach has a similar chicken-and-egg problem, because
the type of interface you're creating may depend on the types
within the closure, but the types within the closure depend on the
type of the interface you're creating.

To see the problem more
simply, one issue is that the expression this no longer
designates the same thing when wrapped by a closure. In order to
abstract (i.e., wrap a closure around) a piece of code that uses
this or uses any names that are inherited from Object,
you would have to systematically rewrite those names to qualify
them in some way to ensure that they refer to the appropriate
object. In other words, you'd have to change every this
to EnclosingClassName.this and every
toString() to
EnclosingClassName.this.toString(). To be most useful as
an abstraction facility, programmers should be free to wrap a
piece of code in a closure without worrying that its semantics
will change subtly. To satisfy Tennent's Correspondence Principle, a
closure should introduce into the scope of the closure no names
that are not explicitly declared by the user - that is, only the
closure's parameters.

Returning from a closure

The principle also applies to the problem of returning from a
closure. The current draft proposal suggests that we use the
existing return syntax for returning from a closure, but
that violates the principles; you can detect the bad smell by
observing that this statement

{ if (...) return; }

doesn't mean the same thing as this:

(){ if (...) return; } ()

The former returns to the enclosing method (not shown), while
the latter returns from the closure, effectively doing
nothing.

One idea, desribed in the further ideas section of the draft
spec, is to introduce a new syntax for returning from a closure,
and reserve the return statement for returning from a
method, function, or constructor. We toyed with the statement
syntax

^expression;

Setting aside aesthetics, this seems to satisfy the
requirements, because this isn't legal syntax outside a closure.
Existing code does not have statements of this form, so wrapping
such code could not change its meaning. However, on further
reflection we realized that this syntax and all of the other new
kinds of return statements that we considered have the same flaw
as using return: while existing code doesn't use
this new statement form, once we introduce it into the language we
should expect people to start using it in new code. And
once they do, any code that includes this statement cannot be
wrapped in a closure without changing its meaning. Once we
identified the bad smell this way, it wasn't hard to come up with
realistic examples where it gets in the way of using the
language.

Having eliminated any form of return statement, what are you
left with? Referring back to Tennent's principles is instructive:
you should be able to replace an expression expr with

(){expr} ()

This is the proposed syntax for returning a value from a
closure. It is simpler than the existing spec because no rules
are needed to infer the return type; it is the type of the
expression. Of course, we will want to allow a list of statements
instead of an expression to allow the construct to close over
statements (in which case the closure returns void), and
allow a list of statements to precede a trailing expression (in
which case the final expression is the result of the closure). An
expression without a trailing semicolon isn't a legal statement
form, so there is no confusion with any existing constructs. This
nicely satisfies Tennent's principles. My only regret is that
there is no single way to both designate the returned value and
return from a closure before its end, which would be slightly more
consistent with Java's existing control constructs.

And more...

We made progress on a number of issues:

We know how to address concerns about how the closure
conversion will interact with APIs like concurrency and
asynchronous notification systems where the closure isn't just
called by the current thread; in those APIs you don't want to
allow nonlocal control-flow or capturing nonfinal variables.

We think we can add support for type inference of exception
types to allow writing APIs in which the set of exceptions from
a block (closure) that is passed in to the API can propogate out
of the API while being strongly type-checked.

We think it should be possible to write a class that
"implements" a function type, as suggested by "crazy" Bob Lee,
so that you can define a custom toString() to make
debugging easier or make the implementation serializable.

We started looking at a number of specific use cases that
have been suggested, such as a library that allows you to
execute a block of code while holding a
java.util.concurrent.locks.Lock, unlocking it at the end, or a
library that closes your streams (or other Closable things)
after your block of code is finished.

We've also made progress on detailing the set of
implementation techniques that might be used to implement the
facility.

Saturday, August 19, 2006

A number of people have asked me: What's the point of closures? Can't
you accomplish pretty much the same thing with the existing language
constructs? What does it really buy you?

If you haven't programmed using closures, and you've gotten used to the
Java idioms for the past ten years, it might be hard to see what closures
really buy you. This is my attempt to give you a glimpse of that, by
way of an extended example.

The problem

Suppose you are working with an application that maintains a list of
documents, and for each individual in some set a list of document annotations.
For the sake of this example, suppose the two lists are parallel. That is,
if you take the list of annotations from an individual, the values correspond
elementwise with the elements of the document list. To make this more
concrete:

It is possible that abstracting in this way allowes us to avoid repeating
this loop throughout the code, if the code frequently needs
to ask this kind of
question. Abstracting common code is good because, among other things,
it allows us to reduce the number of things we need to change
when we refactor
code. For example, if we were to change the representation of a person's
annotations to be a Map<Document,DocAnnotation>
instead of List<DocAnnotation>, or make a DocAnnotation contain
a reference to the Document,
we would have to change this kind of loop everywhere it
appears. So
having a single place where the loop is written is a good thing,
as there are fewer places in the code that depend on the exact
representation of the data.

Abstracting the loop in JDK5

The next step is to try to abstract the loop itself. Java provides some
convenient looping constructs, including the recently introduced for-each
loop with its Iterable interface. We can use that, but we need to
introduce a type over which we're iterating:

So far so good, but this solution has an unfortunate feature: it
constructs the entire list even if the answer can be found by looking at
only the first annotated document. We can do slightly better by
constructing a lazy iterator instead of an eager list. We do that by
rewriting docsWithAnnotations as follows:

Now, without changing personAnnotatedOnKeyword, the same
loop works without the
overhead of building the entire List<DocAndAnnotation>.

This program does, however, produce a large number of small, transient
garbage objects. The Iterator, Iterable, and DocAndAnnotation objects are
all short-lived and simply exist to convey data from one part of the program
to another. This is not necessarily a problem; HotSpot has a number of
garbage-collection algorithms that are good at allocating and reclaiming
short-lived objects. But in some applications it could contribute toward
a performance bottleneck.

Can the looping code be refactored to avoid all these small allocations?
The answer is yes, and there is a standard idiom for doing that in Java.
The idea is to turn the control structure of the loop inside-out.
Rather than having the personAnnotatedOnKeyword perform the
iteration, we have a library method perform the iteration and pass the
values to a snippet of code provided by personAnnotatedOnKeyword.
That would look something like this:

This solves the transient memory allocation problem (if it even was
a problem), but this version of the program again unnnecessarily iterates
through all of the documents even if it needs to iterate through only a few
to compute its result. We can fix that by modifying the
WithDocumentAndAnnotation interface's method to return a
boolean, which indicates whether or not iteration should continue
or whether the loop should abort. This may enable many of the old clients
of the docsWithAnnotations method to migrate to the new API, but
it may not satisfy the needs of all of them. Presumably we could modify
the interface further as we discovered different patterns of control-flow
used by clients that iterate through documents and their annotations.
We leave the details as an exercise to the reader.

The only transient objects created in this version are the closure
object itself and the two iterators used in the implementation of
docsWithAnnotations. As you can see, the caller of
docsWithAnnotations can use control-flow operations like
return without the implementation of docsWithAnnotations
having to anticipate what kinds of control-flow will be required.

Now, in the interest of full disclosure I should mention that the
return statement within the closure is likely to be implemented
under the covers using exceptions. That's one more transient object we
should count. As for the performance of using an exception in this way,
I'm told by HotSpot engineers that an exception used this way
is largely optimized away by the VM if found in performance-critical code.
The most expensive part
of exception handling by far
is capturing the stack trace when creating the
exception, and we would want to create exceptions for this purpose
without stack traces.

That really is it. All of the design iterations we went through to
handle variations on this problem in JDK5 simply don't arise as issues.
That's the beauty of closures: they allow you to easily
abstract away aspects of code that would otherwise require complex
contortions.

A slight update on my current feelings about the draft spec. I
think we are likely to drop the special syntax for a nonlocal return,
and have a different syntax for returning from a closure. This is
hinted at in the Further Ideas section. "return" would mean
return from the enclosing method or function, and if you're inside a
closure you would write a statement something like "^ expression;" to
return a value from the closure. Most value-returning closures are
unlikely to need this syntax, since they can often be expressed using
the "(args) : expression" form.

Friday, August 18, 2006

I'm co-author of a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. An abbreviated version of the original proposal is reproduced below. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

Gilad Bracha, Neal Gafter, James Gosling, Peter von der Ahé

Modern programming languages provide a mixture of primitives for composing programs. C#, Javascript, Ruby, Scala, and Smalltalk (to name just a few) have direct language support for function types and inline function-valued expression, called closures. A proposal for closures is working its way through the C++ standards committees as well. Function types provide a natural way to express some kinds of abstraction that are currently quite awkward to express in Java. For programming in the small, closures allow one to abstract an algorithm over a piece of code; that is, they allow one to more easily extract the common parts of two almost-identical pieces of code. For programming in the large, closures support APIs that express an algorithm abstracted over some computational aspect of the algorithm. We propose to add function types and closures to Java. We anticipate that the additional expressiveness of the language will simplify the use of existing APIs and enable new kinds of APIs that are currently too awkward to express using the best current idiom: interfaces and anonymous classes.

Function Types

We introduce a new syntactic form:

Type

Type ( TypeList ) { throwsThrowsTypeList }

ThrowsTypeList

Type

ThrowsTypeList

ThrowsTypeList VBAR Type

VBAR

|

These syntactic forms designate function types. A function type is a kind of reference type. A function type consists of a return type, a list of argument types, and a set of thrown exception types.

Note: the existing syntax for the throws clause in a method declaration uses a comma to separate elements of the ThrowsTypeList. For backward compatibility we continue to allow commas to separate these elements in method and function declarations, but in function types we require the use of the '|' (vertical-bar) character as aseparator to resolve a true ambiguity that would arise when a function type is used in a type list. For uniformity of syntax, we also allow the vertical-bar as a separator in the throws clause of method and function definitions, and as a matter of style we recommend that new code prefer the vertical-bar.

Local Functions

In addition to function types, we introduce local functions, which are one way to introduce a name with function type:

BlockStatement

LocalFunctionDeclarationStatement

LocalFunctionDeclarationStatement

TypeIdentifierFormalParameters { throwsThrowsTypeList } Block

A local function declaration has the effect of declaring a final variable of function type. Local functions may not be declared with a variable argument list. Local functions may invoke themselves recursively.

Note: this syntax omits annotations, which should be allowed on local functions.

Example

Combining these forms, we can write a simple function and assign
it to a local function variable:

Namespaces and name lookup

The Java programming language currently maintains a strict separation between expression names and method names. These two separate namespaces allow a programmer to use the same name for a variable and for methods. Local functions and closure variables necessarily blur the distinction between these two namespaces: local functions may be used as expression values; contrariwise, variables of function type may be invoked.

A local function declaration introduces the declared name as a variable name. When searching a scope for a method name, if no methods exist with the given name then local functions and variables of the given name that are of function type are considered candidates. If more than one exists (for example, function-typed variable declarations are inherited from separate supertypes), the reference is considered ambiguous; local functions do not overload.

When searching a scope for an expression name, local functions are treated as variables. Function names and values can therefore be used like other values in a program, and can be applied using the existing invocation syntax. In addition, we allow a function to be invoked from an arbitrary (for example, parenthesized) expression:

Primary

PrimaryArguments

Anonymous Functions (Closures)

We also introduce a syntactic form for constructing a function value without declaring a local function (precedence is tentative):

Expression3

Closure

Closure

FormalParametersBlock

Closure

FormalParameters : Expression3

Example

We can rewrite the assignment to plus2b in the previous example using an anonymous function:

int(int) plus2b = (int x) {return x+2; };

Or, using the short form,

int(int) plus2b = (int x) : x+2;

The type of a closure

The type of a closure is inferred from its form as follows:

The argument types of a closure are the types of the declared arguments.

For the short form of a closure, the return type is the type of the expression following the colon. For a long form of a closure, if the body contains no return statement and the body cannot complete normally, the return type is the type of null. Otherwise if the body contains no return statement or the form of return statements are without a value, the return type is void

Otherwise, consider the set of types appearing in the return statements within the body. These types are combined from left to right using the rules of the conditional operator (JLS3 15.25) to compute a single unique type, which is the return type of the closure.

The set of thrown types of a closure are those checked exception types thrown by the body.

Example

The following illustrates a closure being assigned to a variable with precisely the type of the closure.

Subtyping

A function type T is a subtype of function type U iff all of the following hold:

Either

The return type of T is either the same as the return type of U or

Both return types are reference types and the return type of T is a subtype of the return type of U, or

the return type of U is void.

T and U have the same number of declared arguments.

For each corresponding argument position in the argument list of T and U, either both argument types are the same or both are reference types and the type of the argument to U is a subtype of the corresponding argument to T.

Every exception type in the throws of T is a subtype of some exception type in the throws of U.

Exception handling

The invocation of a function throws every checked exception type in the function's type.

It is a compile-time error if the body of a function can throw a checked exception type that is not a subtype of some member of the throws clause of the function.

Reflection

A function type inherits all the non-private methods from Object. The following methods are added to java.lang.Class to support function types:

Note: unlike java.lang.reflect.Method.invoke, Class.invokeFunction cannot throw IllegalAccessException, because there is no access control to enforce; the function value designates either an anonymous or local function, neither of which allows access modifiers in its declaration. Access to function values is controlled at compile-time by their scope, and at runtime by controlling the function value.

The type of null

We add support for null and the type of null in function types. We introduce a meaning for the keyword null as a type name; it designates the type of the expression null. A class literal for the type of null is null.class. These are necessary to allow reflection, type inference, and closure literals to work for functions that do not return normally. We also add the non-instantiable class java.lang.Null as a placeholder, and its static member field TYPE as a synonym for null.class.

Referencing names from the enclosing scope

Names that are in scope where a function or closure is defined may be referenced within the closure. We do not propose a rule that requires referenced variables be final, as is currently required for anonymous class instances. The constraint on anonymous class instances is also relaxed to allow them to reference any local variable in scope.

Note: Some who see concurrency constructs being the closure construct's primary use prefer to either require such referenced variables be final, or that such variables be explicitly declared for sharing, perhaps by requiring them be declared volatile. We reject this proposal for a few reasons. First, concurrency has no special role in the need for closures in the Java programming language; the proposal punishes other users of the feature for the convenience of these few. Second, the proposal is non-parallel with the closest existing parallel structure: classes. There is no constraint that a method may only access, for example, volatile fields of itself or other objects or enclosing classes. If compatibility allowed us to add such a rule to Java at this time, such a rule would obviously inconvenience most programmers for very little benefit. Third, marking such variables volatile, with all the semantic meaning implied by volatile, is neither necessary nor sufficient to ensure (and hardly assists!) appropriate use in a multithreaded environment.

Non-local transfer

One purpose for closures is to allow a programmer to refactor common code into a shared utility, with the difference between the use sites being abstracted into a local function or closure. The code to be abstracted sometimes contains a break, continue, or return statement. This need not be an obstacle to the transformation. A break or continue statement appearing within a closure or local function may transfer to any matching enclosing statement provided the target of the transfer is in the same innermost ClassBody.

Because the return statement within a block of code is given new meaning when transformed by being surrounded by a closure, a different syntactic construct is required to return from an enclosing function or method. The following new form of the return statement may be used within a closure or local function to return from any enclosing (named) local function or method, provided the target of the transfer is in the same innermost ClassBody:

NamedReturnStatement

returnIdentifier : ;

NamedReturnStatement

returnIdentifier : Expression ;

No syntax is provided to return from a lexically enclosing closure. If such non-local return is required, the code should be rewritten using a local function (i.e. introducing a name) in place of the closure.

If a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. (I suspect we can come up with a better name). Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a NamedReturnStatement attempts to return from a function or method invocation that is not pending in the current thread.

Closure conversion

We propose the following closure conversion, to be applied only in those contexts where boxing currently occurs:

There is a closure conversion from every closure of type T to every interface type that has a single method with signature U such that T is a subtype of the function type corresponding to U.

We will want to generalize this rule slightly to allow the conversion when boxing or unboxing of the return type is required, e.g. to allow assigning a closure that returns int to an interface whose method returns Integer or vice versa.

Note: The current Java idiom for capturing a snippet of code requires the use of a one-method interface to represent the function type and an anonymous class instance to represent the closure:

This may be nearly good enough from the point of view of how concise the usage is, but it has one more serious drawback: every creation of a Runnable this way requires that two objects be allocated instead of one (one for the closure and one for the Runnable), and every invocation of a method constructed this way requires an extra invocation. For some applications -- for example, micro-concurrency -- this overhead may be too high to allow the use of the closure syntax with existing APIs. Moreover, the VM-level optimizations required to generate adequate code for this kind of construct are difficult and unlikely to be widely implemented soon.

The closure conversion is applied only to closures (i.e. function literals), not to arbitrary expressions of function type. This enables javac to allocate only one object, rather than both a closure and an anonymous class instance. The conversion avoids any hidden overhead at runtime. As a practical matter, javac will automatically generate code equivalent to our original client, creating an anonymous class instance in which the body of the lone method corresponds to the body of the closure.

Example

We can use the existing Executor framework to run a closure in the background:

Further ideas

We are considering allowing omission of the argument list in a closure when there are no arguments. Further, we could support a sugar for calls to functions whose last argument is a zero argument closure:

void foo(T1 p1, ..., Tn pn, R() pn+1) {...}

could be called as

T1 a1; ... Tn an;
foo(a1, ..., an){...};

where the call is translated to

foo(a1, ..., an, {...});

In the special case where there is only one argument to foo, we also would allow

We are also experimenting with generalizing this to support an invocation syntax that interleaves parts of the method name and its arguments, which would allow more general user-defined control structures that look like if, if-else, do-while, and so on.

This doesn't play well with the return statement being given a new meaning within a closure; it returns from the closure instead of the enclosing method/function. Perhaps the return from a closure should be given a different syntax:

^ expression;

With this, we probably no longer need the nonlocal return statement.

Acknowledgments

The authors would like to thank the following people whose discussions and insight
have helped us craft, refine, and improve this proposal:

About Me

Neal Gafter is a Computer Programming Language Designer, Amateur Scientist and Philosopher.
He works for Microsoft on the evolution of the .NET platform languages.
He also has been known to Kibbitz on the evolution of the Java language.
Neal was granted an OpenJDK Community Innovators' Challenge award for his design and
implementation of lambda expressions for Java.
He was previously a software engineer at Google working on Google Calendar, and a senior staff engineer at Sun Microsystems,
where he co-designed and implemented the Java language features in releases 1.4 through 5.0. Neal is coauthor of
Java Puzzlers: Traps, Pitfalls, and Corner Cases (Addison Wesley, 2005). He was a member of the C++ Standards
Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments.
He holds a Ph.D. in computer science from the University of Rochester.