Activity

I've been experimenting with fleshed out Symbol and Keyword objects with interning. I've found that I need emitters, macros, and functions. With the approach here, I could eliminate the emitters and instead have the analyzer produce invocation forms.

Brandon Bloom
added a comment - 17/Jun/12 4:40 PM I'd also like to extend this for Symbols and Keywords.
I've been experimenting with fleshed out Symbol and Keyword objects with interning. I've found that I need emitters, macros, and functions. With the approach here, I could eliminate the emitters and instead have the analyzer produce invocation forms.

Raphaël AMIARD
added a comment - 18/Jun/12 1:20 PM I think this is an interesting patch. It would be worth adapting it to the decoupled emitters. It raises the question of how it would be possible to share part of the emitters between backends.

I don't see many benefits falling out of this patch. How to best emit language primitives like literals and constants may vary from host to host - perhaps emitting bytecode directly will work best for some implementations.

David Nolen
added a comment - 18/Jun/12 3:25 PM I don't see many benefits falling out of this patch. How to best emit language primitives like literals and constants may vary from host to host - perhaps emitting bytecode directly will work best for some implementations.

Couldn't macros emit bytecode via a mechanism similar to the js* form?

My goals with this are:

1) Move some optimizations from the emit phase further down the pipeline. For example, consider choosing the best associative data structure to create. Why should {:foo "bar"} be optimized to an ObjMap or ArrayMap but (hash-map :foo "bar") not be? Why should that optimization be implemented in such a way that it can not be reused by alternative backends?

2) Operate at a higher level. Prefer working with Clojure forms over target-language code fragments (either strings or byte codes). This is where the code length savings is coming from.

If we continue with this approach, I see 4 or 5 more places where analyzer & emitter code can be replaced with shorter, simpler macros, which are more readily reused by alternate backends.

The one implication (downside?) this approach has on consumers of the analyzer or API is that they may need to do a little extra work when considering :invoke operations for static analysis and the like. However, that seems likely for most analyzers anyway, so this would be a matter of (defmethod handle-special-form :map) vs (defmethod handle-invoke :hash-map)

Brandon Bloom
added a comment - 18/Jun/12 5:18 PM Couldn't macros emit bytecode via a mechanism similar to the js* form?
My goals with this are:
1) Move some optimizations from the emit phase further down the pipeline. For example, consider choosing the best associative data structure to create. Why should {:foo "bar"} be optimized to an ObjMap or ArrayMap but (hash-map :foo "bar") not be? Why should that optimization be implemented in such a way that it can not be reused by alternative backends?
2) Operate at a higher level. Prefer working with Clojure forms over target-language code fragments (either strings or byte codes). This is where the code length savings is coming from.
If we continue with this approach, I see 4 or 5 more places where analyzer & emitter code can be replaced with shorter, simpler macros, which are more readily reused by alternate backends.
The one implication (downside?) this approach has on consumers of the analyzer or API is that they may need to do a little extra work when considering :invoke operations for static analysis and the like. However, that seems likely for most analyzers anyway, so this would be a matter of (defmethod handle-special-form :map) vs (defmethod handle-invoke :hash-map)

Re: 1, I don't think we should be "optimizing" hash-map or array-map (or similar) calls. These functions are a documented way of requesting a map of a particular type (see the docstrings) which I think should not be removed. If anything, we might want to introduce an obj-map function to create arbitrarily large ObjMaps on request (in fact I'll look into that, but that is a separate discussion).

Additionally, the fact that {} is optimized to be a ObjMap in CLJS goes to show that any map-emitting macro will need to be rewritten for each target platform (ObjMap only makes sense when targeting JS, so this optimization simply won't be applicable to other backends). If so and assuming hash-map & Co. retain the behaviour advertised in their docstrings, there's not much gain to implementing this in a macro over just writings a bunch of emitters.

As for decoupling emitters – I think it's perfectly fine for them not to be decoupled, they are the layer closest to the platform after all. Certainly if there's some code which turns out to look the same across multiple platforms it might be worth it to move it upwards in the stack (not necessarily, though – moving it sideways, to a utility namespace / library, might turn out to be more appropriate), but I have a feeling this is an issue best decided once there actually are multiple backends in place and the various costs and benefits can be judged properly.

Now, the story might well be different if we were to introduce some generic factory functions – "create a map of some type", "create a set of some type" etc. – if (and only if!) they would be meant for public consumption. Then implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work. I don't want to pronounce an opinion on the usefulness of such generic factory functions at this time – just pointing out the possibility.

Michał Marczyk
added a comment - 18/Jun/12 5:53 PM - edited Re: 1, I don't think we should be "optimizing" hash-map or array-map (or similar) calls. These functions are a documented way of requesting a map of a particular type (see the docstrings) which I think should not be removed. If anything, we might want to introduce an obj-map function to create arbitrarily large ObjMaps on request (in fact I'll look into that, but that is a separate discussion).
Additionally, the fact that {} is optimized to be a ObjMap in CLJS goes to show that any map-emitting macro will need to be rewritten for each target platform (ObjMap only makes sense when targeting JS, so this optimization simply won't be applicable to other backends). If so and assuming hash-map & Co. retain the behaviour advertised in their docstrings, there's not much gain to implementing this in a macro over just writings a bunch of emitters.
As for decoupling emitters – I think it's perfectly fine for them not to be decoupled, they are the layer closest to the platform after all. Certainly if there's some code which turns out to look the same across multiple platforms it might be worth it to move it upwards in the stack (not necessarily, though – moving it sideways, to a utility namespace / library, might turn out to be more appropriate), but I have a feeling this is an issue best decided once there actually are multiple backends in place and the various costs and benefits can be judged properly.
Now, the story might well be different if we were to introduce some generic factory functions – "create a map of some type", "create a set of some type" etc. – if (and only if!) they would be meant for public consumption. Then implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work. I don't want to pronounce an opinion on the usefulness of such generic factory functions at this time – just pointing out the possibility.

> the story might well be different if we were to introduce some generic factory functions

There are already some generic factory functions. 'set, for example, is documented as "Returns a set of the distinct elements of coll." despite always returning a PersistentHashSet. Similar for vector and some others. It seems like map is the only core data structure that realistically has several reasonable choices for a default representation.

> implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work

So all this was somewhat inspired by tagged_literals.clj – You'll see that those functions are effectively macros which take a form and, generally, return an invocation form.

In my mind, I see Clojure's sugar syntax as a strict expansion transformation.

In theory, this could be implemented at a level lower than the compiler. You could, for instance, define a reader "desugar" mode which only returns lists and primitives instead of vectors, maps, etc. This would greatly reduce the number of special forms in the compiler, since all of these boil down to invocations with macros.

The result would be a significant reduction in the amount of code in the compiler for a proportionally smaller increase in the amount of code in per-language macros and maybe the reader.

> I have a feeling this is an issue best decided once there actually are multiple backends in place

I'll grant you that.

I've said my piece on the topic and don't feel very strongly about this particular patch. I just wanted to spark the discussion about reusing more bits of the compiler between backends. In my mind, it's almost always preferable to transform lists than it is to emit strings. I tried that, and the result was a reduction in responsibilities for the analyzer and macros that were easier to work with than emit methods.

Brandon Bloom
added a comment - 23/Jun/12 5:14 PM > These functions are a documented way of requesting a map of a particular type
D'oh! You're right.
> we might want to introduce an obj-map
I see you did just that with CLJS-322 – nice.
> the story might well be different if we were to introduce some generic factory functions
There are already some generic factory functions. 'set, for example, is documented as "Returns a set of the distinct elements of coll." despite always returning a PersistentHashSet. Similar for vector and some others. It seems like map is the only core data structure that realistically has several reasonable choices for a default representation.
> implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work
So all this was somewhat inspired by tagged_literals.clj – You'll see that those functions are effectively macros which take a form and, generally, return an invocation form.
In my mind, I see Clojure's sugar syntax as a strict expansion transformation.
For example, ^:m {:x [@y 'z/w true]} is simply a shortcut for:
(with-meta (make-map (keyword "x") (vector (deref y) (symbol "z" "w") Boolean/TRUE)) (make-map (keyword "m") Boolean/TRUE))
This sort of thing already happens for @ derefs, # lambdas, etc.
In theory, this could be implemented at a level lower than the compiler. You could, for instance, define a reader "desugar" mode which only returns lists and primitives instead of vectors, maps, etc. This would greatly reduce the number of special forms in the compiler, since all of these boil down to invocations with macros.
Emit methods could be replaced with macros for at least these things: vars, maps, vectors, sets, nil, bools, regexes, keywords, symbols, metadata, and empty lists.
The result would be a significant reduction in the amount of code in the compiler for a proportionally smaller increase in the amount of code in per-language macros and maybe the reader.
> I have a feeling this is an issue best decided once there actually are multiple backends in place
I'll grant you that.
I've said my piece on the topic and don't feel very strongly about this particular patch. I just wanted to spark the discussion about reusing more bits of the compiler between backends. In my mind, it's almost always preferable to transform lists than it is to emit strings. I tried that, and the result was a reduction in responsibilities for the analyzer and macros that were easier to work with than emit methods.

One other advantage of function application over special casing maps/sets/etc is that argument evaluation order is well defined for function application (left-to-right). The Clojure reader returns un-ordered maps & sets, so without changing the reader, we have no way of being able to know what order map key-value-pairs or set elements were originally in. I filed a bug on that. I think we need to make the reader extensible to say to create the return values from their children expressions. In the case of the ClojureScript compiler, we do care about order, so we'd want to return either a (make-map ...) form directly, or a sorted-map by read-order. Same goes for sets.

Brandon Bloom
added a comment - 16/Aug/12 9:34 PM One other advantage of function application over special casing maps/sets/etc is that argument evaluation order is well defined for function application (left-to-right). The Clojure reader returns un-ordered maps & sets, so without changing the reader, we have no way of being able to know what order map key-value-pairs or set elements were originally in. I filed a bug on that. I think we need to make the reader extensible to say to create the return values from their children expressions. In the case of the ClojureScript compiler, we do care about order, so we'd want to return either a (make-map ...) form directly, or a sorted-map by read-order. Same goes for sets.