It has been proposed that we add looping methods to Collection and Map that take advantage of closures, but it would be incompatible to add methods to an existing interface. We can accomplish the same thing using static methods. With one small additional twist to the closures proposal, we could make it even easier to write a method that iterates over maps. Here is what we want our client to look like:

Notice the "for" qualifier on the declaration of eachEntry, and the for keyword on its invocation. Methods defined this way and then invoked using the control abstraction syntax would require the use of the for keyword on the invocation. The compiler arranges such control abstractions to act like loops, by properly interpreting break and continue statements within the body of the closure. Although this example doesn't use break or continue statements, the mere presence of the for keyword on the invocation helps the reader understand the code. In this way user-defined control constructs could be written that appear and act as loops.

27 comments:

Roman Elizarov
said...

Now that is cool. By the way, there is some parallel between syntactic sugar for variable arity arguments that were added in Java 5.0 and nominal closure proposal that you are working on. Both have a special treatment for last method argument. However, in a variable arity syntax this is signified by a special treatment at call site (support for variable arity itself) and by a special declaration of the last method argument (ellipsis). Maybe for consistency abbreviated invocation syntax needs similar treatment. That is, it shall be allowed only for methods that were specially retrofitted with their last parameter declared in some special way. Strategically placed keyword "do" might do the trick.

Can't say I like this much. We've already got 2 different forms of FOR:

1. for (int i; ...; ...)

2. for (Object o : ...)

To add this third form just confuses the situation:

3. for something(Object o1 , Object o2 : ...)

I keep trying to read the something() as a plain method call possibly returning an iterator but the 2nd "parameter" is of course impossible.

I don't know it seems like there is too much "overloading" taking place of "visual structures" that were once simple.

I put the words between quotes because I have no good way of describing them differently.

What I mean is that loops and method calls are structures that are visually easily recognized and distinguished. In this example those structures are mixed and their meaning subtly changed which seems very confusing to me.

1) Am I guessing right, that the loop method must be void? I mean, I guess so, "for" is a statement, no expression. But what is the usage of such looping methods? If I can use only statements, then I won't have really a usage in combination with asynchronous closures. Ever thought about making for/while/do expressions? That would not break compatibility with older versions. But of course that would introduce the problem of break/continue/return in a totally new matter.

So what is the difference between your idea and a normal loop? I can't use map/find because the loop is no expression. In these cases I have to use a surplus local variable instead, even if the result is being processed directly. Using asynchronous closures here is not impossible but kind of useless. I mean you can't have break/continue/return then, and you can't have the nice access to local variables, which is needed because the loop is no expression.

2) Ever thought about removing the "synchronized" requirement from your closure proposal and use "for" instead. Not only I think that "for" is much better to read, you also don't have to use "synchronozed" in such a strange matter.. maybe I would use "do" instead of "for". Anyway, if you really ditch the "synchronized" and use "for" instead I would see this idea as a good progress for the closure proposal. Of course only if the eachEntry method can be used without "for" as normal method call expression in case I don't handle synchronous closures.

Comments are a bad place to have flame ware, but I’ll still make a comment on comments posted here. IMHO, it is a very bad to turn Java into functional language by adding full-blown closures with map, reduce, and all other functional constructs. Unfortunately, that is what so many people are trying to do. However, Java is so good and popular as a language because it is an imperative language. Let keep it this way, but let us extend this language by adding our own imperative constructs, like our own loops, etc.

What's with the apparent requirement for a separate static method and subject? No chance to pass around or decorate. Why not go for an Iterable type approach in keeping with the enhanced for loop. So we can use it as:

FWIW, I think the enhanced for loop syntax is the wrong way around. The declaration gets 'executed' every iteration, but the iterable expression does not. It should have looked something like:

List<String> strings; ...for (strings : final String str) { ...}

Bit late now.

BTW: I can't see how in the suggested syntax the compiler knows what to do with continue. Not supporting continue is my favoured option. My first attempt at the above had Iteration returning an Iterator-like object that could be controlled by code generated within the calling method (like classic JSP tags). However, as anyone who has written class JSP tags will know, that causes even more boilerplate.

It is much less common to iterate over a Map's entries than its keys or values. And when you need to do it, it isn't that bad. Calling it "an ugly exercise in boilerplate" is unfair to the Collections API.

Like many of Josh's comments on closures, rather than address the requirement of a particular use case he imagines an alternative use case in which a solution isn't necessary. In my example, different maps may have different names for the same droid; a name is not an inherent property of the droid. While there are certainly alternative use cases that don't benefit from this solution, as Josh rightly points out, that doesn't negate the importance of addressing these use cases.

tom: regarding "I can't see how in the suggested syntax the compiler knows what to do with continue"

The block closure passed to a "for" method invocation is treated as follows: a "continue" appearing in it brances to the bottom of the closure body. A "break" branches out of the invocation just past the invocation statement. In this way breaks and continues appearing in the body act as they would in a loop.

Perhaps I didn't make myself clear. I don't think this use case is realistic. Moreover, I don't think the language is improved by allowing programmers to define their own loop constructs. The purpose of the for-each construct was to allow programmers to define their own "loopable" types, which drastically reduces the need for defining custom loop constructs. So the benefit in allowing custom loop constructs is small, but the costs are not.

One of Java's historical strengths has been its transparency: What you see is pretty much what you get. As a consequence, any Java programmer can easily read and maintain code written by any other. We take this property for granted, but there are many languages that do not have it. Allowing programmers to define their own looping constructs encourages the formation of dialects, reducing transparency. And it adds to the complexity of a language that historically prided itself on simplicity.

That is not to say that I am against providing such facilities in general. On the contrary, I think they are a great boon to many existing languages, such as Lisp and Smalltalk. I would seriously consider providing them in a new language being designed from scratch. But I do not think they belong in Java at this point in its development.

francis: I realize you're being sarcastic, but the basic closures proposal already allows you to do that kind of thing. The only reason something special is worth doing for loops is that they introduce a new binding for the break and continue statements.

I can certainly think of a recent use case which I think mirrors your example to some extent - one in which a parent entity maintained a map of child entities for all defined primary keys (a finite set), but where there was not necessarily an explicit child entity for each possible key and so there was some rule whereby the parent would use a default child entity if one did not exist for a particular key, potentially resulting in multiple primary keys mapping to the same value.

The implementation was such that all possible key/value pairs were actually held in the Map (the rule was applied at the time the values were stored rather than retrieved), and both the parent's idea of what the key was, and the value's other attributes, could be used when queried. Perhaps most pertinently, the entities could be queried by some property other than the key, such as a the value's 'date' attribute being within a particular range (which is far more difficult to use as a key).

So there's code that queries by the Map's key, there's code that queries by the values' properties, and there's code that queries by both, and they all look different.

In retrospect I'm sure it could have been implemented in a better way, but in any case the 80/20 rule applies here: I use the collections API constantly, and 80% (or more) of that usage is trivial and easy to understand. The code that fits into the 'other 20%' is much like the paragraphs above which describe my use case; overly verbose, probably not sufficiently clear, and ultimately far more likely to be misunderstood and to contain inaccuracies.

Finding ways to cut through the unnecessary verbosity and to unify solutions to similar problems must be the goal (ie. querying the data - not dealing with the fact that it's in a java.util.Map) and like many people I do see something like myCollectionOrMap.find(a closure) and myCollectionOrMap.each(a closure) as being a potentially simpler way of dealing with the more complicated (more bug-ridden) use cases.

Where your idea actually fails for me though is its lack of API discoverability; to improve upon the collections API in a way that the majority of Java programmers will routinely take advantage of, it needs to be discoverable by an IDE so that it may be presented to them when they hit that '.' after the identifier (or something similar). Sad fact is that too many of the candidates I've interviewed for a Java programmer role have a 'deep understanding of the collections API' which begins and ends at ArrayList (or worse, Vector) - and too few of the decent ones have the time or opportunity to fully explore what's there; most of the code that ends up in production is written under pressure, and as IDEs have become more 'intelligent' they've narrowed the scope of commonly used documentation from the package-level to the type-level.

In my experience many programmers don't even use many of of the existing static methods on java.util.Collections when they should, because they either didn't know about them, or 'forgot'. Me included I expect.

So even if everyone loved the syntax, and the idea, I think there's a bigger problem to solve before an idea like this would actually make enough of a difference to be worthwhile. Personally I'm undecided about the syntax, and I have no backwards-compatible solutions to the discoverability problem I think it suffers, but I like the idea.

Is all this -really- neccessary? You can get 95% of all this (extensive!) functionality with fancy notation for anonymous classes and coding 'java-style' instead of looking at java as if it's some sort of domb stepchild of LISP.

Take a look at CICE for example (http://crazybob.org/2006/10/java-closure-spectrum.html).

I like that -way- better. Java6 is moving toward more flexibility on the JVM front with eg invokeVirtual. I'd strongly prefer java, the language, keeps its current vibe, but JVM, the platform, gains abilities like this by way of supporting entirely different languages.

A lot of enterprisey folks have already gotten a shocker with java5 (which only recently entered the J2EE scene, really). Shaking up things with yet another MAJOR re-structuring of the JLS itself, with, in fact, -far- further reaching effects than the java5 changes ever had... I dunno. Actually, I -do- know. I'm against it.

Neal, I hope this is no offending question.. but what do you think about C# 3.0 lambda functions? I mean they are readable, they are easy and they won't really require new keywords. Of course something like Func can not exist in Java, because an interface that I can use like Func<A> and Func<A,B> is not doable with Java generics. Or am I wrong here?

Another thing is method pointers. Maybe you want to read this small article from Sundararajan

They could be added to Java very easily, no new keyword required, typing can be done at compile time.

But this leads back to the single interface for all types of closures (and method pointers). At the beginning I had the hope that adding closures to Java would help to prevent the interface explosion. But now I don't think this will happen, because when we do that basing on interfaces, then we don't get a general interface for all types of closures.

The names are always implying a functionality and I think that you wouldn't consider two interfaces as equal if one is named Runnable and the other is named DiskFormatingTaks. Even if they have both run methods of the same kind.

Of course that means we forget about break/continue/return, (a)synchronous closures and such. The thing gets even better with currying..

But as I said, the central problem is the inavailability of the generic Func interfaces. And the way covariance is hanlded in generics doesn't make the life more easy

Jochen: I think this is the most important basic question on this line of designing closures: do you add function types at the same time? As you observe, there are good reasons to do so. There are other reasons that you didn't touch upon, and I'll talk more about those later.

We've played with the syntax a lot, and it isn't clear we've settled on the best one yet.

There is no reason to abandon break/continue/return with lambdas. They are still necessary in a language in which these control statements are scoped, as they are in Java. Otherwise you're only closing over part of the lexical environment.

I am surely nobody of the "don't do it, because it is too difficult" fraction ;)

I think adding function pointers or something that does look like a function automatically sets the semantics for break/continue/return. return = return from the function/closure, not from the method like you suggested. break/continue only valid if in the function is a loop and break/continue are inside this loop.

Only when seeing closures as blocks it makes sense to give break/continue/return a different meaning. But if you really want that, then you shouldn't restrict the access to lokal variables and don't make a difference between the synchronous or asynchronous case when accessing local variables.

But normal blocks don't have parameters! And closures without parameters are kind of half done only. When defining a closure like (String str, int i){str+i}, then it looks for me like a method, where I removed the name, the modifier and the return type.

I know Smalltalk, Ruby and of course Groovy as examples of languages with block closures. And all these languages are defining the parameters inside the block. I really think that is because of good reasons. They are not really functions, they are blocks you can give parameters - so the semantic is slightly different. The way Java allows to initialize arrays makes it a bit difficult here. Groovy removed the array initialization syntax in favor of the block based closures and uses the list syntax for arrays instead. A compiler with a big lookahead can surely solve that without problems, but I guess you don't want that in Java.

But even if all these problems are solved, for example by directly using the {|(parameterlist|statementlist} syntax from Ruby, we still face that fundamental problem of interface explosion.

A simple convert function would help here much, but overloading the cast operator is a no go. It violates the rule, that a cast doesn't change the reference. But any function to make that cast is too much.

Perhaps Josh would find this use case more realistic if he knew that of the 21 for-each loops appearing in his recent joint programming project with me and Lexi, 4 of the loops (19%) were of this form.

Hmm. After thinking some more, I'll go back to anti-CICE and back to supporting the for loop construct. For example, here's the CICE version of the Droid thing (and I've frequently looped on entries, too, plus all the other alternatives):

In all, the for loop thing is far more readable and comprehensible than this CICE version. But I still dislike the "throws E" boilerplate. Exceptions in Java are crazy.

But in general, current lack of easy abstraction in Java leads to more concrete code. And more code duplication. But more concrete code also has its benefits. Just DRY folk like me have trouble swallowing it.

And the for loop is interesting still because it's the one control structure in Java that is callbacky in style (well, since Java 5 for loops, at least). I might prefer no need for "for" in front, though, so more like your older examples.

I have to say that the direction the Java language is taking is leaving a sour taste in my mouth. I loved how the language used to be clear. I felt it was one of its strongest suits. How I could look at arbitrary code from where ever and know EXACTLY what it did.

As the language evolves I find myself having to look at the language specification more. Which I don't think is good. It's slowly turning into C++, which frankly is a mess of a language.

Closures work brilliantly in Ruby so I love the idea of having them in Java, primarily to use instead of for loops. However this doesn't look natural to me. I'd much rather see:

for eachEntry(String droidName, Droid droid : map) { // do something}

written as:

eachEntry(map) {String droidName, Droid droid => // do something}

To me that clearly shows map being passed into eachEntry and that droidName and droid are parameters of the code block. I don't like the original syntax because it's not a for loop, it's a method taking a map and a block. As such it is obscuring what is actually going on.

I know it's blasphemy, but I think it is time for a new Collections package. It is a normal package after all (java.util) and has several other flaws due to its legacy:- Stack is not an interface- Map#get takes an Object, not a subclass of K... to name just a few.

The Jakarta Commons Collections (see http://commons.apache.org/collections/) have a lot of nice Ideas (even something the named "Closure" ;-) that are worth a thought.

About Me

Neal Gafter is a Computer Programming Language Designer, Amateur Scientist and Philosopher.
He works for Microsoft on the evolution of the .NET platform languages.
He also has been known to Kibbitz on the evolution of the Java language.
Neal was granted an OpenJDK Community Innovators' Challenge award for his design and
implementation of lambda expressions for Java.
He was previously a software engineer at Google working on Google Calendar, and a senior staff engineer at Sun Microsystems,
where he co-designed and implemented the Java language features in releases 1.4 through 5.0. Neal is coauthor of
Java Puzzlers: Traps, Pitfalls, and Corner Cases (Addison Wesley, 2005). He was a member of the C++ Standards
Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments.
He holds a Ph.D. in computer science from the University of Rochester.