Tuesday, 10 April 2007

Java Control Abstraction for First-Class Methods (Closures)

Stefan Schulz, Ricky Clarkson and I are pleased to announce the release of Java Control Abstraction (JCA).
This is a position paper explaining how we envisage the First-Class Methods (FCM) closures proposal being extended to cover control abstraction.

Java Control Abstraction

So, what is control abstraction? And how does it relate to FCM? Well its all about being able to add methods in an API that can appear as though they are part of the language. The classic example is iteration over a map. Here is the code we write today:

As can be seen, the API method above has a few unique features.
Firstly, it has two halves separated by a colon.
The first part consists of a method type which represents the code to be executed (the closure).
The second part consists of any other parameters.
As shown, the closure block is invoked in the same way as FCM.

The allowed syntax that the developer may enter in the block is not governed by the rules of FCM.
Instead, the developer may use return, continue, break and exceptions and they will 'just work'. Thus in JCA, return will return from the enclosing method, not back into the closure. This is the opposite to FCM. This behaviour is required as the JCA block has to act like a built-in keyword.

One downside of the approach is that things can go wrong because the API writer has access to a variable that represents the closure.
The API writer could store this in a variable and invoke it at a later time after the enclosing method is complete.
However, if this occurred, then any return/continue/break statements would no longer operate correctly as the original enclosing method would no longer be on the call stack and a weird and unexpected exception will be thrown.

The semantics of a pure FCM method invocation are always safe, and there is no way to get one of these unexpected exceptions.
But, for JCA control abstraction we could find no viable way to stop the weird exceptions.
Instead, we have chosen to specifically separate the syntax of FCM from the syntax of control abstraction in JCA.

Our approach is to accompany the integration of control abstraction into Java by a strong set of messages.
Developers will be encouraged to use both FCM callbacks and JCA control abstractions. However, developers would only be encouraged to write FCM style APIs, and not JCA.

Writing the API part of any control abstraction (including JCA) is difficult to get right (or more accurately easy to get wrong).
As a result, some coding shops may choose to ban the writing of control abstraction APIs, but by having a separate syntax this will be easy to do for the tools.
It is expected, of course, that the majority of the key control abstractions will be provided by the JDK, where experts will ensure that the control abstraction APIs work correctly.

Summary

This document has taken a while to produce, especially by comparison with FCM.
In the end this indicated to us that writing a control abstraction is probably going to be a little tricky irrespective of what choices the language designer makes.
By separating the syntax and semantics from FCM we have clearly identified the control abstraction issue in isolation, which can only be a good thing.

I think this is a strong misstatement: "However, developers would only be encouraged to write FCM style APIs, and not JCA. ... Writing the API part of any control abstraction (including JCA) is difficult to get right (or more accurately easy to get wrong)."

I think it's quite easy to get right for most common cases (especially with proper language support which also isn't too fancy). But having special syntax for this could make it easy to know what you are doing and therefore easier to get right. So maybe it is better to have a special syntax separate from other forms such as what you've proposed. And the compiler might be able to help out too with the distinct syntax.

Considering exception swallowing: I this should be the exception (pun noticed after the fact, honestly) rather than the rule. Almost _always_ it would be safe to assume that exceptions in the block would come outside the block. Anything else would be confusing.

Actually I think that "swallowing" exceptions would be fairly simple. If the control abstraction method (CAM) accepts a block type which itself throws an exception, but the CAM doesn't in turn throw that exception, then the CAM is expected to swallow the exceptions declared in the block:

@Tom, The statement that JCA is hard may be too strong. However, it definitely is easy to get wrong. At the very least there should be a checklist of things to check when implementing such a method.

Ideally, tools like checkstyle or PMD could then encode the rules into their checking to help further. This is a key advantage of using a different API declaration syntax - that tools can easily identify it and apply specific rules.

@Matthew, Yes, you've worked through examples of swallowing exceptions. These are an essential part of closures.

Thanks, Matthew, for bringing up examples. I thought the BGGA proposal would explain fairly enough on why it makes sense to swallow exceptions.Matching exceptions, not thrown by the abstraction method, to those a block may throw is only half the story. An abstraction method may take closures that throw more than the declared exceptions. Otherwise, one may come into the need for declaring a bunch of methods or adding up on exceptions allowed for a block. BGGA does provide such a mechanism, although using Generics not necessarily is the best tool for it. Some collector mechanism may suffice, which also prevents from overriding the generics on applying control abstraction. For example:

Where ... is a collator for passed through exceptions. This is necessary to allow developers to deliberately define if the abstraction method will take exceptions (and which ones) or pass exceptions.

@Tom, I'm not sure I agree with Stephen on that specific statement. But in the end, it is more difficult to get JCA right than FCM style closures. Especially handling of non-local-transfer, i.e., break, continue, and return statements gives one strong headaches.

Break, continue, and return can be made to work automatically except in the cases of execution in separate threads (e.g., "invokeAndWait()"). Or in cases of deferred execution which shouldn't be done, and that was already discussed well. Synchronous work on the current thread would be the common use case, and it would work automatically and easily.

For swallowing exceptions, the more I think about it, the more I think it should _never_ be done by control abstraction.

And that's what's nice about FCM assuming asynchronous and JCA assuming synchronous. Exceptions can just work correctly automatically in each case. FCM can assume you won't throw them to the surrounding block by default, and JCA can assume you will. Problem solved.

@Stefan, I have read the BGGA proposal but have been following FCM more closely, so forgive me if I ask redundant questions.

I did understand the semantics of exception transparency, but thank you for taking time to spell things out.

I got the impression while reading the proposal that exception transparency was implicit. In other words, by using the block syntax, any exception that the block throws is also thrown by the control abstraction method. (Unless the block type in the CAM signature throws a specific exception type, in which case the CAM would be required to catch that exception.)

However I didn't see any mention of the "..." syntax in the JCA proposal. Are you just clarifying intent with your example, or are the ellipses actually part of the intended syntax?

@Tom, it's difficult to explain in a short comment, but break, continue, and return do not work automatically, unfortunately, although it looks as if it would. The block of a JCA is passed as inner method to the abstraction method, which can handle it like any other inner method, i.e., call it, store it, loop over it, or apply it concurrently (e.g., for matrix operations). There is and should be no restriction on what can be done, as this would limit the expressiveness of closures in general. Hence, all the problems of non-local transfer as described in BGGA can appear.It takes quite more than this short paragraph to explain, many more information are given by a couple of posts in Neal's blog, though.

@Matthew, JCA is a position paper, not a proposal. We suggest a syntax to clarify the application of control abstraction that fitted our requirements. It's by no means complete nor does it fully cover all features a control abstraction may provide or need to provide in the end. So, yes, the "..." only is an option and a syntax I used to clarify my explanation, as is the syntax stated in the JCA document. The syntax given in BGGA is another option.

My "never swallow" statement was off base. If the main purpose of a block is to swallow certain exceptions (or in other clear cases), then that sounds okay. For example (perhaps in a unit test framework using closures instead of annotations):

You can handle a non-local return, break, and continue without needing the complication of a control abstraction by naming the non-local method to return from, this is the opposite way round to BGGA where you have to stop non-local returns. E.G.:

@Tom, on the syntax, it is important to remember that block is just an ordinary variable, so it should be prefixed by a genuine Java type. In this case, the type is being restricted to be a method type. This is necessary, as you could assign the block to an instance variable, or put it in a hashmap or do all sorts of weird stuff (most of which are of course a bad idea).

What this syntax does suggest is that the alternative method type syntax from FCM may make sense:

#(String return int throws Exception)

hence:

public static eachEntry(for #(K, V) block: Map map) {__ ...}

On the for, personally I'm not 100% convinced its needed on the application side, but it is needed in the API to define if the method captures continue/break.

I understand it's just a variable. I was proposing being inconsistent with type names in this case. Out of the options presented, I like your current spec better than "#(String return int ...)". So, I'll also prefer "for #(void(K, V)) block: ..." if you want to stay consistent.

I reviewed a bit better and saw that semantically you were using "for" like BGGA. Sorry for the error. Concerning that, I like the BGGA placement of "for" better. So, combining the two styles gives this:

void for eachEntry(#(void(K, V)) block, Map map) { ...

I do love to see "throws X" go away. And, beating a dead horse, I'd like this even better (just assume this applies as a general caveat on "#()" typing syntax until I mention otherwise - and I'll try to avoid mentioning it again for a while):

"We also considered and rejected the option of not assigning a variable name to the block being passed to the control abstraction method definition. This option increases the safety of the overall solution, effectively preventing many possible problem areas. However, it removes the ability to perform some important use cases, so was not a viable option."

I'm curious, which use cases are prevented when no variable name is assigned to the block?

@Matthew, It is possible to implement a multithreaded execution around items in a list, where all the processing is completed before the control abstraction syntax is completed. This is hard to achieve without a variable.

Also, something as simple as implementing one method that delegates to another is impossible without a block variable.