Monday, July 23, 2012

Seeking Closure in the Mirror

I've discussed mirror based reflection many times in the past, in this blogand in talks. And of course I'm not the only one - you can read Alan Wirf-Brock's posts on mirrors in Javascript. In this post, I want to focus on a particular kind of mirror that has not received much attention. Before I get to deep into the details, a few words of background.

You cannot get at the internals of a function: you can only apply it to various arguments and see how it responds. This is sometimes known as procedural abstraction. Among other things, it is the basis for object-based encapsulation.

Most languages that call themselves object-oriented do not actually support object-based encapsulation. One of the ways they get by despite this defect is to rely on procedural abstraction directly. Perhaps the most notable example of this is Javascript. The only way to encapsulate anything in Javascript is to put it inside a function. Elaborate design patterns leverage Javascript’s closures to provide encapsulation.

You can see from the above that procedural abstraction is absolutely fundamental. There appear to be circumstances where we might nevertheless might wish to breach the defenses of procedural abstraction.

Consider implementing a database interface in the style of LINQ, or Ruby on Rails, or Glorp. The underlying model is that the database consists of collections, and that these collections are accessed via standard functional operations such filter, map, reduce etc. The arguments to these operations include closures. For example, you might write a query such as:

cities.filter(function(city){return city.name = ‘Paris’;});

and get back a collection of answers that included Paris, Texas, and perhaps some other cities. To implement this interface on top of a database, you might want to transform this code into a SQL query. To do that you need to understand what the closure is doing. In .Net, for example, the type system is designed to coerce a literal closure into an abstract syntax tree representing the expression inside it, which can then be compiled into SQL.

Of course, it might be that you cannot reasonably compile the code into a SQL query at all. We will assume that the system is allowed to fail in any case it deems too hard, but we’d like to cope with as many situations as we can.

The LINQ approach relies on static typing, but this is not essential, and in fact has drawbacks.

For example, the static approach precludes the following:

query(f) {return cities.filter(f);}

A more general alternative is to dynamically derive the AST of the closure body. Regardless, it seems I need a way to get the AST (or at least the source) of a closure - something that procedural abstraction is of course designed to preclude.

Even if I can get the source or AST, that isn’t always enough. Suppose I want to write

var cityNames = [‘Paris’, ‘London’, ‘New York’];

cities.filter(function(city){

return cityNames.contains(city.name)

});

I need the value of cityNames in order to execute the query. In general, I need to get at the scope of the executing closure.

Smalltalk and its relatives allow you to do this. How do they get around procedural abstraction? Well, in the case of closures, they basically throw procedural abstraction out the door. Every Smalltalk closure will gladly provide you its context, which is a reified scope that will allow you to find out what all the variables used in the closure are.

Obviously, this is not a very secure solution. One way we can usually reconcile security and reflection is via mirrors, and that is the focus of this post. Given an object mirror that has full access to the closure object's representation, you should be able to get all the information you need. This still has the drawback that the representation of closures is exposed as a public API.

In this case, we want a ClosureMirror. Essentially, there needs to be an object with the magical ability to see into the closure, overcoming procedural abstraction. The closure itself must not allow this; it must be impenetrable. The capability to look inside it must be a distinct object that can be distributed or withheld independently (exercise for the reader: find another way to solve this problem).

Concretely, a ClosureMirror needs to able to provide the source code of the closure it is reflecting and a map from identifiers to values that describes the closure’s current scope.

Another situation where closure mirrors would be handy is serialization. If you need to serialize an object that includes a closure, you again need access to the closure’s scope.

I have not seen closure mirrors discussed elsewhere. As far as I know, the only implementation was done as part of the Newspeak-to-Javascript compiler. We are also considering it in the context of the Dart mirror system. The Newspeak-on-Javascript implementation of closure mirrors is rather naive and inefficient. One reason for this inefficiency is that Javascript provides no support whatsoever to do this sort of thing. In any case, the idea is new and virtually untested, but I think it has potential.

17 comments:

I've faced similar challenges due to my interest in mobile code - i.e. need to send for a function to a remote host, so need a representation of the closure. My solutions have generally been to construct function and an AST representation for that function in parallel, i.e. instead of having `A -> B`, I have `Fun A B` and functions to extract the underlying or representation. (Conversely, I could inject a function into `Fun` by pairing it with metadata. I had left a lot to programmer discipline.)

Using this technique, I could only access closures I had explicitly prepared for such use, but that was quite sufficient for my use cases.

Another interesting technique I've seen is from Conal Elliott's Tangible Values, in particular his use of arrows for deep application and acquisition of information about the underlying implementation of a function. Conal's purpose involved rendering of a tangible value composed of other tangible values.

Conal has also done a lot of work for modeling compilers in the code. I.e. using `Exp A -> Exp B` we can extract structure of B by injecting a variable (e.g. `Var "A"`) for `Exp A`, and thus we can compile a closure from these functions. It does come at a cost to security, but might be sufficient for LINQ and such.

All these techniques are pretty much the same - spatial-temporal separation of representations. Your use of mirrors suggests support from the language implementation. Though, I suppose we could model mirrors without language support by use of sealer/unsealer pairs (mirror holds unsealer).

> The LINQ approach relies on static typing, but this is not essential, and in fact has drawbacks.For example, the static approach precludes the following:

> query(f) {return cities.filter(f);}

I'm not sure what you mean. Maybe the fact that the type of f must be Expression> (the AST type) rather than plain Func? That is indeed limiting, but it isn't due to static typing; it's because at runtime a compiled Func may not have enough info to be decompiled into the right AST. When writing a LINQ query I sometimes want to make sure my exact AST reaches the LINQ provider, and compiler optimizations can get in the way.

It would be easier if we could get via reflection the exact original AST (or sourcecode!) that was compiled. But that's not a problem due to static typing per se. And it would go against your wider point in this post.

Now, about your position that lack of encapsulation is bad. There are things we do today that rely on reflection access to object structure, which would be much harder to accomplish if objects were message-passing blackboxes. Serialization, various object graph operations that rely on finding references to an object, etc.

Similarly, if functions were not encapsulated - if reflection routinely provided the AST of every function and the captured name-values of every closure, and compilation did not lose any source code information - then some things we do anyway would become much easier. For instance, development tools that analyze source code (but would then be able to analyze the compiled code), live code editing, code versioning.

Encapsulation does have benefits. It's just not clear to me whether they outweigh the negatives, and what is a good balance - whether encapsulating functions and not objects, or something else.

How about a two-tier system? Some designs introduce software modules at a scale larger than objects. Concurrent agents are a good example. Agents could be the unit of encapsulation. Then ordinary functions and objects and their state could be manipulated via reflection, but any state-holding object could be referenced by one owning agent only. Communications between agents would be by message passing, and an agent's state would be encapsulated from other agents. Agents could then represent external components, etc. A special mirror might provide privileged access to treating other agents in the same process as ordinary transparent code, to allow debuggers etc.

Why is this better than 'mirrors all the way down'? In my opinion, allowing all code to reflect on objects it creates and on existing objects in its locality (its agent) would remove the need for most mirror management. Otherwise passing the necessary mirrors to the code that needs them might end up like checked exceptions: so tedious to manage that many developers declared all functions to throw Exception. Equally, if you call a method and have to pass a mirror because it calls another that calls another that might in some configurations use an implementation that needs a mirror, then you get used to just passing a mirror everywhere.

a. Re: the tedium of mirrors. What I found (after several iterations) is that I wanted to pass regular objects into the mirror API and get mirrors out. This reduces the pain a good deal. The Newspeak mirror API is structured this way.

b. Allowing "open access" within an actor, using it as a security boundary, is in fact what we do in Dart. It's debatable if that is the best answer, but its an answer. But encapsulation goes beyond security (e.g., software engineering), and so do mirror (e.g., to distribution and deployment).

c. Mirrors are useful in providing a line between encapsulated access at the base level, and open access at the meta level. How open can be controlled based on what mirrors one provides.

To summarize: I believe in very strong base level encapsulation based on procedural abstraction (i.e., per object privacy). Mirrors serve as capabilities at the meta level, that can be provided to circumvent this encapsulation in a controlled manner. I've discussed this extensively many times - I won't repeat it all here.

Indeed, what you describe is quite similar to the implementation of ClosureMirror in the ns2js compiler. The problem is it is rather expensive. It's especially bad if you want ClosureMirrors to be available on demand - then you have to wrap every closure in a special structure rather than utilize the native closures directly.

Consequently, I am indeed suggesting that the underlying system should support this as a primitive.

I am referring to your blog post where you say "The capability to look inside it must be a distinct object that can be distributed or withheld independently". You are talking about a ClosureMirror instance responsible for looking inside a particular closure, right? How would you usually get a ClosureMirror instance respectively how would you prohibit the user to get one (ie. withhold it)?

A key property of mirrors is that they are separate from the objects the reflect. So having a closure (or nay object) does not imply you have a mirror for it.

How this manifests concretely depends on the setting you are in.

In Newspeak, all code outside a module must be passed in from the outside: you don't get access to the Mirror module unless it has been explicitly given to you. And one can choose to give a Mirror module that doesn't support ClosureMirrors, or only produces such mirrors on closures nested within the receiving module etc. In short, everything is sandboxed by design. This blog contains several posts about this topic (why imports are a bad idea, why Newspeak makes dependency injection frameworks n=unnecessary etc.).

Another approach is taken by Dart. In Dart, an isolate (a kind of actor) is a security boundary. If you want to reflect across isolates, you will need access to the other isolate's MirrorSystem object. To get it, you will need a port on the other isolate that is willing to give you such a thing.

Always providing an AST is an option as well I suppose. But unless queries are built in to the language (which seems to be the case in your system), someone still needs an API to derive these ASTs conveniently. Closure mirrors are such an API, and you need to get at the bindings of closed-over variables as well as the ASTs.

It can also be done via macros and such - but even you still need an API to get at bindings.

Agreed.In fact, I abbreviated considerably how a speech action is represented. The complete representation is closer to a triple:(queryFn,AST,map of (string,AST))where the third element is exactly the 'closed over' (aka free) variables of the query.

I would have to admit to being fairly nervous about providing a general API to access free variables of a closure. I think I would need to see some strong evidence for the need for that API.

Macros are one way (in fact, Star uses macros). By signaling something 'going on' in the surface language, that has the merit of being able to strongly constrain access to a closure's free variables.