Description

Summary: Duplication of ast nodes in various keys and in :children is problematic for some users. In particular, it doesn't play nicely with printing. A solution is needed which preserves "The AST is data. Period."

The attached patch makes no changes to how the sub nodes are currently stored outside of the :children key. Instead, it replaces :children values with a vector of keys into the main ast node. This preserves child ordering and allows a children function to be defined as:

The attached patch has two caveats: 1) Many (but not all?) blocks will now be present in the children hierarchy as 'do forms. 2) Interleaved forms are intentionally bugged with respect to ordering. The :children vector for those forms match the actual behavior, not the expected behavior. This can be fixed along with http://dev.clojure.org/jira/browse/CLJS-288

I realize that, but it was one approach suggested and seemed entirely viable, so I tried it. From what I can tell, it seems like a solution that meets all of the criteria that came up during the discussion.

Did I overlook a requirement?

Who is currently using :children and could weigh in on whether or not this still meets their needs?

Brandon Bloom
added a comment - 03/Jun/12 11:24 AM I realize that, but it was one approach suggested and seemed entirely viable, so I tried it. From what I can tell, it seems like a solution that meets all of the criteria that came up during the discussion.
Did I overlook a requirement?
Who is currently using :children and could weigh in on whether or not this still meets their needs?

Ah OK, you're right. It's the same problem there that I ran into with :statements and analyze-block: There is a pseudo AST node that needs to be traversed down into.

The easy fix for that here would be to merge {:children [:init]} into each binding, but the binding hash will be missing :op and :form. For analyze-block, I was able to use :op :do and :form (list 'do ...), but there isn't an obvious analogy here without inventing a :binding op or relaxing the requirement for :op and :form.

Interestingly, each successive binding can be treated as a single nested binding. What if we defined let as a macro which expands to a series of let1 forms? (let [x 1 y 2] (* x y)) --> (let1 [x 1] (let1 [y 2] (* x y))

That may increase the depth of the AST, but it would really simplify traversal into binding clauses. Each let op would have :name, :init, and children as [:init].

In general, it may be worth while to consider doing the same thing with the various analyze-blocks situations, such that 'do is the only place multiple statements can occur.

Brandon Bloom
added a comment - 03/Jun/12 12:09 PM Ah OK, you're right. It's the same problem there that I ran into with :statements and analyze-block: There is a pseudo AST node that needs to be traversed down into.
The easy fix for that here would be to merge {:children [:init]} into each binding, but the binding hash will be missing :op and :form. For analyze-block, I was able to use :op :do and :form (list 'do ...), but there isn't an obvious analogy here without inventing a :binding op or relaxing the requirement for :op and :form.
Interestingly, each successive binding can be treated as a single nested binding. What if we defined let as a macro which expands to a series of let1 forms? (let [x 1 y 2] (* x y)) --> (let1 [x 1] (let1 [y 2] (* x y))
That may increase the depth of the AST, but it would really simplify traversal into binding clauses. Each let op would have :name, :init, and children as [:init].
In general, it may be worth while to consider doing the same thing with the various analyze-blocks situations, such that 'do is the only place multiple statements can occur.

Brandon Bloom
added a comment - 03/Jun/12 12:16 PM Er, I mean: a hypothetical let1 op would have children [:init :statements :ret]
However, if you also expanded the implicit do blocks, you'd be able to simplify to [:init :body] or something like that.

Forgive me for repeatedly commenting, but it also occurs to me that you can change 'do to a macro which expands to a series of 'do2 forms:

(do x y z) --> (do2 x (do2 y z)) or (do2 (do2 x y) z)

That do2 op could have children [:left :right] and all other :statements and :ret children could be simplified down to 'do2 expansions.

With that in mind, the children fn becomes simply (defn children [ast] ((apply juxt (:children ast)) ast)) and a lot of analyze and emit code gets much simpler by not having to map over many collections all over the place.

Kinda a wild idea and probably not the right forum for it, but worth suggesting. Thoughts?

Brandon Bloom
added a comment - 03/Jun/12 12:34 PM Forgive me for repeatedly commenting, but it also occurs to me that you can change 'do to a macro which expands to a series of 'do2 forms:
(do x y z) --> (do2 x (do2 y z)) or (do2 (do2 x y) z)
That do2 op could have children [:left :right] and all other :statements and :ret children could be simplified down to 'do2 expansions.
With that in mind, the children fn becomes simply (defn children [ast] ((apply juxt (:children ast)) ast)) and a lot of analyze and emit code gets much simpler by not having to map over many collections all over the place.
Kinda a wild idea and probably not the right forum for it, but worth suggesting. Thoughts?

Every node (i.e. an map with :op, :form and :env keys, called an "Expression Object" in the docstring for analyze) in ClojureScript corresponds to a self-contained expression. A binding form is not self-contained, it is a part of the let expression.

Another similar example is functions. There is a node {:op :fn ...} but there is no {:op :fn-method} even thou there are maps under the :methods key describing function methods. The same argument goes for this: A function method is not self-contained, it is part of the function and should not be a node.

The nodes (expression objects) which make up the ClojureScript AST seems to be really well thought out and I would be careful in making the proposed change.

Jonas Enlund
added a comment - 05/Jun/12 12:44 AM - edited Every node (i.e. an map with :op, :form and :env keys, called an "Expression Object" in the docstring for analyze) in ClojureScript corresponds to a self-contained expression. A binding form is not self-contained, it is a part of the let expression.
Another similar example is functions. There is a node {:op :fn ...} but there is no {:op :fn-method} even thou there are maps under the :methods key describing function methods. The same argument goes for this: A function method is not self-contained, it is part of the function and should not be a node.
The nodes (expression objects) which make up the ClojureScript AST seems to be really well thought out and I would be careful in making the proposed change.