User login

Navigation

What is a "fully featured closure"? Request for comments.

We have an argument on the Wikipedia article on closure on whether C# 2.0 and ECMAScript closures are "fully featured" closures or not. The crux of the argument against them being true closures seems to be that they do not capture lexical bindings of return/break/continue statements from the outer scope in either of those languages. You can read the entire discussion here

I would appreciate any comments on the issue from people familiar with the subject - figured of all places, this would be the best one to look for any such ;) - either in this thread, or on the discussion page of the article.

Comment viewing options

I'm not sure that's the right question to ask. Closures are not a language feature, they're an implementation technique for a language feature (first-class functions).

An analogy to the debate you're having is one over whether C++ and Java have fully-featured vtables. What you really should be talking about instead is the semantics of dynamic dispatch in those languages (and I guess you would be, but only indirectly).

So the real question is about the scope rules of those languages. Are return/break/continue lexically scoped (if so, they would have to be captured in "real" closures) or are they only "sort-of" lexically scoped. If closures in implementations of the languages you're talking about don't capture those things, there's nothing wrong with the closures---but there might be something wrong with the scope rules of the language.

I don't know of any language where first-class functions close over return. Return corresponds roughly to the current continuation. The usual rule for evaluating a lambda abstraction is:

eval [[fn x => E]] e k = k (fn v k' => eval [[E]] e{x->v} k')

Where e is the environment and e{x->v} is e extended with a binding of x to v. That is, the value of a lambda abstraction is a function expecting an argument v and continuation k', which when applied will evaluate the body in an extended environment, and continue with k'.

This function closes over e, it is a free variable in fn v k' => .... It does not close over k (there is not continuation free in fn v k' => ..., instead a continuation is received from the application site).

Closing over k would be weird. Since functions still receive an argument and a continuation (I presume that non-first-class functions return normally to their caller, eg they must be passed a continuation), that would mean that first-class functions are:

eval [[fn x => E]] e k = k (fn v k' => eval [[E]] e{x->v} k)

That is, first-class functions do not return to their caller, but rather return a value to k, the continuation of the lambda expression (so, in a strongly-typed language, all functions would have to return functions of their same type).

The short answer is no, fully-featured closures do not need to close over continuations. Nor do they close over the store in a language with mutable state.

I don't know of any language where first-class functions close over return.

Just to clarify: Smalltalk does that, and it seems that Java 7 will do as well. I rather dislike it, as it forces a strong distinction between top-level function definitions, which introduce a continuation, and nested definitions, which close over that continuation. But it's out there nonetheless.

At runtime, if a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a return statement is executed if the method invocation to which the return statement would transfer control is not on the stack of the current thread.

My approach seems similar to the Java proposal with UnmatchedNonlocalTransfer, which makes it an error to use a continuation that is no longer active. However there is a provision for returning a value and keeping the current continuation valid (yield).

I believe that not allowing continuations to outlive the scope, but offering a syntactically different mechanism for yield, makes for clearer code while maintening a good range of expressibility. I think the world should just forget about call/cc ;)

A return statement always returns from the nearest enclosing method or constructor.

That is, in the terminology of Landin's SECD machine, return denotes the dump, not the current continuation.

It still isn't captured in closures. In the third encoding of Landin's J operator (the second one on page 13) here you can see that functional values (ie, fn x c d => [[t]] (fn x d => d x) (fn x => c x d)) do not have a free dump identifier.

I don't think it's the job of an encyclopedia to define what is a real closure if there are multiple definitions in practical use. Apparently, people don't agree what a closure should be exactly, so the article should explain the different points of view. Also, the article should explain the debate on what is a real closure. This debate should emphasize that both semantics are actually reasonable, i.e. the first is not necessarily inferior to the second. This discussion should explain what the impact of a hidden return/break/continue binding is.

The real problem with the current article is that it is completely unclear to an outsider why C# and ECMAScript do not have 'real closures': the explanation that 'the lexical binding of the "return" statement is hidden inside of an anonymous method' is very cryptic.

ECMAScript has fully-featured closures, by any accepted definition, and I think it's a great candidate to use for examples on the Wikipedia page.

The issue about whether "the lexical binding of the return statement is hidden inside of an anonymous method" seems poorly-defined at best. As the quote says, return is (usually) a statement, and as such also typically a primitive construct, not a lexical variable. Any claim about the "lexical binding" of return would have to be motivated with respect to the semantics of the language in question, as would any connection between the semantics of return and the definition of the term "closure".

In the ECMAScript case, the issue doesn't apply at all, since return operates consistently in all functions, simply returning from the current function. At worst, you could argue that ECMAScript's return statement is well-designed. :) It has no bearing on whether ECMAScript has fully-featured closures.

The issue in question seems to arise only in languages with a distinction between methods and closures, which choose not to follow the ECMAScript approach and instead somehow restrict the use of the return statement within closures. Both the Java and C# versions in question impose such restrictions: in the C# case, return is apparently not available within anonymous methods, whereas in the Java (7?) case, return within a closure will apparently return from the nearest enclosing method when that makes sense, and throw an exception otherwise. In an informal, marketing-oriented sense, one can see how the latter might be described as being "more full-featured" than the former, but that doesn't actually have anything to do with the definition of the term "closure". [Edit: for some factual corrections of this last paragraph, see this comment.]

...but I'm not sure I follow why anyone has an issue with the "return" statement, as I'm not sure where the situation arises where a return is not just syntactic sugar for assigning and exiting the value of the function that contains the statement. Or put another way, Scheme and ML don't have explicit return statements for their functions - but they do have implicit return calls that basically return the value of the last expression within the function.

In the proposed semantics for Java, 'return' returns from the nearest enclosing method, even if the 'return' is invoked from within a closure defined inside that method. That creates a problem if the closure is evaluated after the enclosing method has already returned. Java throws an exception in this case. C# apparently avoids this (in v2.0?) by preventing 'return' from being used inside its anonymous methods, and in v3 apparently treats 'return' the way ECMAScript does (although I'm summarizing based on a Wikipedia discussion thread, believe at your own risk...)

That code wouldn't work as written under the Java closures proposal, because as you say, the use of 'return' would attempt a return from the enclosing "method". To make it work, you'd have to omit the 'return' keyword.

I didn't have my coffee early enough this morning. I read "nearest enclosing method" as "nearest dynamically enclosing method" rather than "nearest statically enclosing method".

In the proposed semantics for Java, 'return' returns from the nearest enclosing method, even if the 'return' is invoked from within a closure defined inside that method. That creates a problem if the closure is evaluated after the enclosing method has already returned.

Right, that problem is that the closure itself has no access to the 'continuation' described by 'return'. It seems odd to describe this state of affairs as "closing over the return".

Ooh, that didn't even occur to me. Now that you mention it, I rather like the idea of returning from the nearest dynamically enclosing method, in a perverse sort of way -- it would actually work more consistently than the current Java proposal. Internally, when a closure is invoked, the invoking method's continuation would be passed to it, and that would be used if 'return' was invoked within the closure. (This'd be a great feature to add if one thinks that the ability to reason about code is overrated.)

Right, that problem is that the closure itself has no access to the 'continuation' described by 'return'. It seems odd to describe this state of affairs as "closing over the return".

A closure captures a block of code - the block statements and the expression - parameterized by the closure's formal parameters. All free lexical bindings - that is, lexical bindings not defined within the closure - are bound at the time of evaluation of the closure expression to their meaning in the lexical context in which the closure expression appears. Free lexical bindings include references to variables from enclosing scopes, and the meaning of this, break, continue, and return.

This seems like a rather suspect rationalization to me. This text says that "the meaning of this, break, continue, and return" are "free lexical bindings" within the closure, and thus their meaning is "bound at the time of evaluation of the closure expression to their meaning in the lexical context in which the closure expression appears". I'd love to see a semantic description of Java that demonstrates this. It would be quite different from any semantic description of Java that I'm familiar with. ;)

This text says that "the meaning of this, break, continue, and return" are "free lexical bindings" within the closure, and thus their meaning is "bound at the time of evaluation of the closure expression to their meaning in the lexical context in which the closure expression appears". I'd love to see a semantic description of Java that demonstrates this. It would be quite different from any semantic description of Java that I'm familiar with.

The Java Language Specification is quite clear that the break and continue statements break or continue from the nearest statement of the right kind. Similarly, it is clear that the return statement returns from the nearest enclosing method or constructor.

The Java Language Specification is quite clear that the break and continue statements break or continue from the nearest statement of the right kind. Similarly, it is clear that the return statement returns from the nearest enclosing method or constructor.

Sure. That doesn't somehow make 'break', 'continue', or 'return' into lexically-bound names, though — at least, as I said, not in any definition of Java semantics that I've seen.

...but I'm not sure I follow why anyone has an issue with the "return" statement, as I'm not sure where the situation arises where a return is not just syntactic sugar for assigning and exiting the value of the function that contains the statement.

When you model blocks in Algol 60 as the direct application of lambda the ultimate. That was the motivation for call/cc's daddy, Landin's J operator (and why the SECD machine has a C and a D, and why J is so clumsy to use in practice).

...but I'm not sure I follow why anyone has an issue with the "return" statement, as I'm not sure where the situation arises where a return is not just syntactic sugar for assigning and exiting the value of the function that contains the statement. Or put another way, Scheme and ML don't have explicit return statements for their functions - but they do have implicit return calls that basically return the value of the last expression within the function.

Syntactic sugar or not, return is lexically scoped, and introduces a "name" for producing a result from the function. Using the same name ("return") in every function causes the names in inner functions to be shadowed from outer functions. This has a huge practical impact on the expressive power of lambda in the programming language. What would be the impact on the usability of Smalltalk blocks, for example, if the return form "^expression" returned from the innermost block?

One reason Scheme and ML don't have explicit return statements is that this avoids the implicit shadowing.

Syntactic sugar or not, return is lexically scoped and introduces a "name" for producing a result from the function.

Could you provide a reference for this? This seems to appeal to a possible definition of Java semantics which would involve 'return' being lexically bound within methods, but this doesn't match any semantic description of Java that I've ever seen, and certainly isn't what the Java Language Specification says.

I may well be misunderstanding what you're suggesting, but it sounds like you want "return" to capture the continuation of the surrounding method when you create a closure. So if you do something like:

int method(params) {
field = {x => return x}; // make a closure and store it in a field
// ...
// ... do some work ...
// ...
return something;
}

Then it looks like you can call the closure again after a call to method returns, and the draft spec you wrote says that "a closure may outlive the target of control transfers appearing within it." Does that mean you save the whole continuation to make that work? IOW, are you adding full first-class continuations to Java?

I haven't been following this thread closely, but this all sounds a lot like "escape continuations" or "one-shot continuations" to me. You can, to a first approximation, implement return as an unhygienic macro that binds an escape continuation:

The problem with this, though, is if you want proper tail calls: an application of return to an expression will eagerly evaluate the argument before returning, so dynamically nested tail calls will accumulate stack.

A simple (if inefficient) workaround is to trampoline the let/ec expression, always returning a thunk from inside the expression and applying it outside the expression. I used this trick to create a tail-recursive version of let/ec in PLT Scheme. The subtle thing with this is that any side effects, such as exceptions, will occur in the dynamic context of the caller, not the body. So you probably don't want to use this when your context frame has exception handlers installed in it.

We're probably going to use that same trick for the reference implementation of ECMAScript 4 in SML. We'll model tail calls by raising an exception TailCallException of unit -> VALUE. And return will not introduce a tail position underneath a try-block. This way exceptions will be raised underneath the correct handlers, but return does not introduce tail positions underneath a try in its own function body.

(BTW, the presence of return means that the insinuation on the ECMAScript Wikipedia entry that SML's support for tail recursion somehow helps us with tail recursion in ES4 is bogus. Tail recursion in a language with return is very different from tail recursion in an expression-based language.)

... all of this neatly echos a very recent discussion of how to model labeled break-with-value in BitC. In particular, we were exploring whether introducing them required any new core semantics. The answer turned out to be yes, because we need to allow local polymorphic exception declarations in order to use exceptions as a simulation of non-local break with value. I should perhaps add that the language has escape continuations only (by intent).

return ... and introduces a "name" for producing a result from the function.

I think this one at least is quite debatable. Sure, if you consider it a name, then it would make sense to say that it's lexically scoped. Even so, it is an implicitly declared name in such a case, so it would make just as much sense that it is implicitly redeclared in nested closures. There's no good reason to prefer "implicitly declared on the first nesting level, and closed over on all following levels" over "always implicitly declared on every level" - it's only a matter of convenience, and one could come up with situations where either one is convenient, and the other is inconvenient.

Aside from that, I'm rather uncomfortable with treating "return" as a kind of lexically scoped name. On the "gut feeling" level, it certainly doesn't feel like one. It also hints at some even more uncanny ideas, such as "else" being a name in the lexical scope of "if"...

What would be the impact on the usability of Smalltalk blocks, for example, if the return form "^expression" returned from the innermost block?

It would still have the full expressiveness of Scheme lambdas, which I do not think anyone would consider deficient.

Typically, the if-else construct is a special form in Scheme and cond is defined in terms of if-else (although the inverse can also be done). But in an eager language, the if-else construct can't be defined in terms of the language - which is why a special form is required.

Be that as it may, I still don't understand what any of this has to do with the notion of proper closures. I'd stipulate that Standard ML can be comfortably labeled as a language that does closures correctly, but it doesn't have continuations, return or call/cc (discounting that SML-NJ provides them). So I'd propose that the languages in question have closures, but they may or may not have a continuation mechanism that one defines as optimal.

If you want your 'myIf' to have the same semantics as the native 'if' you have to use the native 'if' inside of 'myIf' (assuming that (a) condition is of the native boolean type and (b) there is no other way do destruct booleans (predicates like schemes eq? are ruled out because they return ... native booleans)).

It would still have the full expressiveness of Scheme lambdas, which I do not think anyone would consider deficient.

Actually, it wouldn't! Counterintuitive but true. Scheme lambda allows you to take two forms s1 and s2, and write a function "myif" that acts as an "if" used this way

I was responding to the claim that scheme lambdas are *more* expressive than smalltalk blocks without nonlocal return, specifically because allegedly scheme lambdas can emulate schemes 'if' expression. I have just 'proved' that the same emulation is possible in smalltalk too.

I think this one at least is quite debatable. Sure, if you consider it a name, then it would make sense to say that it's lexically scoped. Even so, it is an implicitly declared name in such a case, so it would make just as much sense that it is implicitly redeclared in nested closures.

I agree, that's the crux of this. As Neal put it in his blog entry, there are various additional "lexically scoped language constructs" in a language like Java which need to be captured by closures. The constructs that are most like ordinary names are simple to deal with: variable, method, type, and label names. All of these are unambiguous because the rule is that if a name isn't explicitly shadowed, its bindings are inherited from the environment in which the closure was created.

Handling 'break' and 'continue' within closures is also relatively traightforward, because their referents are not introduced implicitly. The problem arises with 'return', because return deals with continuations, and a closure has a continuation just as a method does. Lexical scoping doesn't provide a clear basis for deciding which continuation 'return' should refer to -- if anything, based on lexical scoping alone, it makes more sense that 'return' in a closure should refer to the continuation of the closure.

What would be the impact on the usability of Smalltalk blocks, for example, if the return form "^expression" returned from the innermost block?

It would still have the full expressiveness of Scheme lambdas, which I do not think anyone would consider deficient.

One way that Smalltalk's return (i.e. "^") is used within blocks is to escape from a method iterating over a collection, for example. In Scheme, call/cc can be used for that. If you changed the semantics of "^" within blocks, you'd need to use an alternative approach in those cases -- perhaps exceptions, which I'm guessing many Smalltalk programmers wouldn't like.

It should be noted though that 'break', 'continue' and 'goto' from within the closure also effectively deal with continuations, if they are to be implemented in a proper way (such that the closure can outlive the environment it was created in).

Lexical scoping doesn't provide a clear basis for deciding which continuation 'return' should refer to

Tennent's correspondence principle provides clear support for capturing the meaning of return from the enclosing context. It also illustrates why Steele's "Lambda the Ultimate Imperative" is undermined by any other treatment of return.

Tennent's correspondence principle provides clear support for capturing the meaning of return from the enclosing context. It also illustrates why Steele's "Lambda the Ultimate Imperative" is undermined by any other treatment of return.

It's not that cut and dried though. We could say that the history of closures from Landin up until Java 7 provides clear support for only capturing free first-class values. And myif is hardly the killer app of "Lambda the Ultimate Imperative" ;) We ocassionally do return closures, not just apply them immediately or pass them as arguments. (Think of a trampoline. In Scheme or SML you can build a thunk out of any expression in tail position, return the thunk, and invoke it with a trampoline. Define "tail position" for a sequence of Java statements with respect to a method in the obvious way. Now, you can't generally wrap that sequence in a closure, return it and then invoke it.)

At the least, some of the program transformations we use and love are not generally possible with Java 7 closures. Lambda lifting isn't, because there is no way in Java to pass the value of "return" as an argument. Likewise, defunctionalization isn't, because there is no way to store the value of "return" in an object.

And for related reasons, lambda dropping isn't, because you can't rename the binding of "return" in a method when dropping a closure into the scope of that method. Likewise, refunctionalization is possible but not completely straightforward, because a return from the apply method in the first-order case morphs into a return from some enclosing method and not a return from the closure in the higher-order case.

It seems to me that by capturing this second-class citizen "return" there are fewer correctness-preserving transformations available, compared to leaving the return statement to mean "return from this function-like thing"? Doesn't UnmatchedNonlocalTransfer raise a red flag?

Anyway, we all know that the real problem lies with return, not closures. :)

Both the Java and C# versions in question impose such restrictions: in the C# case, return is apparently not available within anonymous methods, whereas in the Java (7?) case, return within a closure will apparently return from the nearest enclosing method when that makes sense, and throw an exception otherwise.

It's a tad different there. C# 2.0 anonymous methods already allow the use of "return", and treat it the same way ECMAScript closures do. C# 3.0 provides new syntax for these, in which the final expression is implicitly returned. Java guys deliberately introduce a syntax in which there's no way to explicitly return a value from the closure (it's always the result of the final expression), which allows them to reuse "return" keyword the way they do.

I'm rejuvenating this long dead topic because I've been reading through the draft of the new Scala book and started thinking about the word "closure" and wondering why it's generally defined so narrowly - most definitions seem to focus on function values. In fact, this draft of the Scala book does so as well.

However, in Scala class definitions may be nested within other lexical scopes (e.g. methods, functions, and classes). Free variables in instances of such nested classes will capture bindings from their enclosing scope and, unlike Java, will do so without restriction. It strikes me that instances of nested classes follow the spirit of "closure" even though they're not functions (except perhaps in an abstract OO calculus sense).

Is the word "closure" too entrenched as meaning a function value with captured lexical bindings? If so, is there a word or short phrase that can be used to describe the more general concept of any value (function or not) with captured lexical bindings?

It's easy to conclude that an OO object can be a closure if you go down the path of "of course it can be a closure because really it's a lambda in front of a bunch of lambdas plus a lot of sugar." But the question is what happens when to stick to the abstraction level defined in the language where classes are assumed primitives? Does the word closure lose value if its broadened to include such constructs?

In general, I'd call them all closures, and I'd say "function closure" if I wanted to be more specific. I think it's totally reasonable to say, for example, that nested class definitions close over their lexical environment.

If you really want to be precise, though, probably you ought to be careful to distinguish the scoping rule from the implementation technique. In that case, "closure" might not really be the right word for any of these things, depending on how they're implemented. In common practice, though, it's probably too late to avoid questions like "Does Language X have closures?"

The question of whether the word "closure" means the general concept vs the specific implementation technique seems to have long since been decided in favor of the concept. Probably because the terms used for the general concept were too verbose.

Thanks all. Of course, the way of natural languages means that what we think is the correct definition may or may not reflect popular usage. A quick informal survey of postings on the net even suggests that people are using the word "closure" to mean "anonymous function."

All of these definitions are from the Common Lisp Hyperspec, except the one for continuation, which is from Wikipedia:

binding

an association between a name and that which the name denotes. ``A lexical binding is a lexical association between a name and its value.''

environment

a set of bindings

closure

a function that, when invoked on arguments, executes the body of a lambda expression in the lexical environment that was captured at the time of the creation of the lexical closure, augmented by bindings of the function's parameters to the corresponding arguments

continuation

A continuation reifies an instance of a computational process at a given point in the process's execution. It contains information such as the process's current stack (including all data whose lifetime is within the process e.g. "local variables"), as well the process's point in the computation.

scope

the ... textual region of code in which references to ... a binding ... can occur

You may be interested in our Getting Started page. For those specific questions, I recommend the Essentials of Programming Languages by Friedman and Wand. It has a good introduction to those concepts; although I don't think that it's a great introduction to continuations. (But then again, I don't know of a truly great introduction to continuations...)