Description

Given a leiningen/cljsbuild project that defines multiple outputs, the advanced compilation preprocessor that identifies keywords to be inlined into constants_table.js fails to emit keywords for files that have already been processed by a previous build. This causes, for example, the :parents keyword used by 'derive' to refer to an undefined symbol.

In my actual project I'm using leiningen profiles to build up a list of cljsbuild compilation targets: two versions of the main clojurescript js (one with :advanced, one with :whitespace), and two versions of my unit tests (one with :advanced, one with :whitespace). I found that after switching to 1978 the :advanced versions didn't load successfully, and I managed to track the issue back to missing keyword constants in the generated code.

I found that I could replicate the problem in a minimal project by building two targets from the same leiningen invocation: the first one with :whitespace and the second with :advanced. This produces two cljsbuild-compiler-N directories. The first (:whitespace) does not contain a 'constants_table.js' file, the second does. The 'constants_table.js' seems to be missing all of the constants that were used to produce the first cljsbuild-compiler-N output. This is a regression - 1934 does not exhibit this problem. My guess is that it's related to the incremental compilation changes.

It is important to our development process to be able to produce multiple clojurescript outputs from the same leiningen invocation: we expect our builds to produce at least a main.js file and a tests.js file.

Happy to get in touch via irc/email/mailing list if that would be preferable to you.

Stephen Nelson
added a comment - 29/Oct/13 8:25 PM In my actual project I'm using leiningen profiles to build up a list of cljsbuild compilation targets: two versions of the main clojurescript js (one with :advanced, one with :whitespace), and two versions of my unit tests (one with :advanced, one with :whitespace). I found that after switching to 1978 the :advanced versions didn't load successfully, and I managed to track the issue back to missing keyword constants in the generated code.
I found that I could replicate the problem in a minimal project by building two targets from the same leiningen invocation: the first one with :whitespace and the second with :advanced. This produces two cljsbuild-compiler-N directories. The first (:whitespace) does not contain a 'constants_table.js' file, the second does. The 'constants_table.js' seems to be missing all of the constants that were used to produce the first cljsbuild-compiler-N output. This is a regression - 1934 does not exhibit this problem. My guess is that it's related to the incremental compilation changes.
It is important to our development process to be able to produce multiple clojurescript outputs from the same leiningen invocation: we expect our builds to produce at least a main.js file and a tests.js file.
Happy to get in touch via irc/email/mailing list if that would be preferable to you.

David Nolen
added a comment - 29/Oct/13 8:44 PM - edited Have you tried with master? I committed a fix for CLJS-645, which should fix the issue where the constants table is not present or placed in the wrong location as far as dependency order.
As far as building two builds from the same Lein process part of the issue may very well be with cljsbuild - cljs.closure/build has a reset flag which should be set for different builds.

Stephen Nelson
added a comment - 29/Oct/13 8:59 PM Just tried master: it exhibits the same problem for my project and for the example project attached above.
Has the reset flag behaviour changed recently? I'm using the latest version of cljsbuild - if it's a cljsbuild bug it wasn't being triggered with prior versions of clojurescript.

David Nolen
added a comment - 29/Oct/13 9:10 PM - edited There wasn't a reset flag before. This will need to be handled by cljsbuild.
We still need more information. Does the build succeed if it is clean, that is, if you delete all files generated by previous builds?

David Nolen
added a comment - 29/Oct/13 9:59 PM I need more instructions on how to recreate. If I build only one of the builds I can't run it alone. Can you build a clean one of each by itself. Thanks.

I'm not sure what you mean by 'running it alone'. The 'first' build (pretty.js) is only present to trigger the bug. The 'second' build (main.js) is loaded by the 'index.html' file in the project root. Loading this in chrome after running 'cljsbuild once' results in an exception.

I'm not sure what you mean by 'running it alone'. The 'first' build (pretty.js) is only present to trigger the bug. The 'second' build (main.js) is loaded by the 'index.html' file in the project root. Loading this in chrome after running 'cljsbuild once' results in an exception.

Sorry if I wasn't clear, you keep building both targets together - this is unlikely to work. But in order to isolate the issue I want to know that the constant table problem isn't present if you just build one target, i.e. `lein cljsbuild once first`, check the constants_table.js, is it correct?

David Nolen
added a comment - 29/Oct/13 11:18 PM Sorry if I wasn't clear, you keep building both targets together - this is unlikely to work. But in order to isolate the issue I want to know that the constant table problem isn't present if you just build one target, i.e. `lein cljsbuild once first`, check the constants_table.js, is it correct?

Sorry for not being clear. You're completely correct, this problem only occurs when building both targets together. Compiling main.js independently results in a correct constants_table.js.

Compiling only one target is not an acceptable resolution for my team, and I'd imagine we're not the only ones in that situation. A workaround is to do the :advanced build first, but it's fragile: the 'new' cljsbuild syntax orders builds by id so users need to pick appropriate IDs to correctly order builds and avoid triggering the bug.

It seems reasonable that the appropriate resolution of this issue means changing cljsbuild rather than clojurescript, but multiple builds are a key feature of cljsbuild vs calling cljsc directly. What are the consequences of setting the reset flag on compile performance, particularly on caching/incremental builds?

Stephen Nelson
added a comment - 30/Oct/13 12:07 AM Sorry for not being clear. You're completely correct, this problem only occurs when building both targets together. Compiling main.js independently results in a correct constants_table.js.
Compiling only one target is not an acceptable resolution for my team, and I'd imagine we're not the only ones in that situation. A workaround is to do the :advanced build first, but it's fragile: the 'new' cljsbuild syntax orders builds by id so users need to pick appropriate IDs to correctly order builds and avoid triggering the bug.
It seems reasonable that the appropriate resolution of this issue means changing cljsbuild rather than clojurescript, but multiple builds are a key feature of cljsbuild vs calling cljsc directly. What are the consequences of setting the reset flag on compile performance, particularly on caching/incremental builds?

I don't have any bright ideas about sharing information across builds. However I think we could probably support parallel builds easily by allowing cljsbuild to pass along the needed dynamic vars (instead of resetting). Until then cljsbuild will absolutely need to pass the reset flag for a different build.

David Nolen
added a comment - 30/Oct/13 12:27 AM I don't have any bright ideas about sharing information across builds. However I think we could probably support parallel builds easily by allowing cljsbuild to pass along the needed dynamic vars (instead of resetting). Until then cljsbuild will absolutely need to pass the reset flag for a different build.

Just found out about the reset flag here, via a cljsbuild issue opened by Stephen. Most of the state that necessitates its existence has been added very recently, and none of it helps makes the compiler or analyzer easily or effectively used from a tooling perspective.

Now that I've stepped back, I'm guessing that this is why using e.g. Austin to drive multiple ClojureScript REPLs sometimes produces flaky results, depending upon just how divergent the respective loaded codebases are. [I'm personally still on 1845 for my important projects because I hadn't been able to sniff out the problem.] Any other repeated usage of the compiler is likely to produce incorrect results (e.g. compiling CouchDB views or plv8 stored procs).

The reset flag doesn't help (much) in any of these scenarios, since its effects are racy, and many compiler use-cases are applied in multithreaded environments.

Dynamic vars that contain values that are updated/used only through a particular application of the analyzer or compiler are fine, but all of the stateful vars (i.e. anything that contains an atom that gets set!-ed) need to be pulled, and replaced with some kind of compilation environment (an atom containing a map) that is passed into the various entry points in cljs.compiler and cljs.closure as needed. This would allow the compiler to provide reasonable defaults for any state that isn't set/provided by the caller, something that is largely a PITA to ensure when using dynamic vars. We're only looking at three vars here AFAICT, so the actual change shouldn't be particularly complex.

Chas Emerick
added a comment - 30/Oct/13 4:39 AM Just found out about the reset flag here, via a cljsbuild issue opened by Stephen. Most of the state that necessitates its existence has been added very recently, and none of it helps makes the compiler or analyzer easily or effectively used from a tooling perspective.
Now that I've stepped back, I'm guessing that this is why using e.g. Austin to drive multiple ClojureScript REPLs sometimes produces flaky results, depending upon just how divergent the respective loaded codebases are. [I'm personally still on 1845 for my important projects because I hadn't been able to sniff out the problem.] Any other repeated usage of the compiler is likely to produce incorrect results (e.g. compiling CouchDB views or plv8 stored procs).
The reset flag doesn't help (much) in any of these scenarios, since its effects are racy, and many compiler use-cases are applied in multithreaded environments.
Dynamic vars that contain values that are updated/used only through a particular application of the analyzer or compiler are fine, but all of the stateful vars (i.e. anything that contains an atom that gets set!-ed) need to be pulled, and replaced with some kind of compilation environment (an atom containing a map) that is passed into the various entry points in cljs.compiler and cljs.closure as needed. This would allow the compiler to provide reasonable defaults for any state that isn't set/provided by the caller, something that is largely a PITA to ensure when using dynamic vars. We're only looking at three vars here AFAICT, so the actual change shouldn't be particularly complex.
Does this sound sensible?

David Nolen
added a comment - 30/Oct/13 9:02 AM I think every top level atom will need to be accounted for? I count 8.
While putting everything into a single atom sounds appealing, I think putting namespaces in there is a lot of breakage - will need some discussion with the community.

Do you expect namespaces-related breakage because you know of downstream tools that touch that var, or because refactoring it into a compiler environment would be a big/gnarly change?

Yes, I count 8 7 as well, but I don't think every atom needs to move in order to make this reset mechanism unnecessary. Insofar as only recent statefulness is causing some difficulty, we can start there. But anyway, here a (maybe pedantic, but I'm writing to learn) survey of the globally-stateful bits I can see in the compiler, writ large:

cljs.analyzer/*unchecked-if*: it's always toggled to prevent warnings for a particular range of emitted code. AFAICT, it's completely internal to the compiler. Except for how it's used in core.cljs (which I don't quite understand yet?), the toggling could be replaced with dynamic binding.

cljs.analyzer/-cljs-macros-loaded: Strictly an analyzer internal AFAICT, though I don't see any usage of the macros or fn that touch it?

cljs.analyzer/*constant-table*: A big one that needs to be carted around in any compiler "environment". It's a little odd to me that this is in the analyzer at all (more of an emission optimization, right, thus why it defaults false and only comes into play w/ :advanced?) actually, though I see a TODO talking about pushing out related logic. Also a recent addition, so the refactoring shouldn't cause any downstream breakage.

cljs.analyzer/constant-counter: An internal gensymming facility, can stay as is (and should probably be marked ^:private.

cljs.analyzer/namespaces: used throughout the compiler, as well as in the core REPL ns. Been around for a long time though, so if it stays put for a while, that'll be OK.

cljs.compiler/compiled-cljs: A cache of compilation results that looks to be important w.r.t. generating source maps? Seems like a straightforward refactoring to touch a private dynamic var.

cljs.compiler/ns-first-segments: looks like it's only used for controlling js name munging? Sharing this among different compilation processes seems harmless. Also been around for a while. Nice, this recent commit eliminated this atom, reimplementing on top of cljs.analyzer/namespaces.

cljs.closure/compiled-cljs: Another cache, this one of goog provides/requires; should be easy to lift into a compilation environment

The sanest way to move these bits of state into a compilation environment would probably be to bind a private dynamic var in each namespace in each of the entry points into the compiler/analyzer, so that existing signatures (e.g. compile-file) don't have to change.

Chas Emerick
added a comment - 30/Oct/13 10:02 AM - edited Do you expect namespaces-related breakage because you know of downstream tools that touch that var, or because refactoring it into a compiler environment would be a big/gnarly change?
Yes, I count 8 7 as well, but I don't think every atom needs to move in order to make this reset mechanism unnecessary. Insofar as only recent statefulness is causing some difficulty, we can start there. But anyway, here a (maybe pedantic, but I'm writing to learn) survey of the globally-stateful bits I can see in the compiler, writ large:

cljs.analyzer/*unchecked-if*: it's always toggled to prevent warnings for a particular range of emitted code. AFAICT, it's completely internal to the compiler. Except for how it's used in core.cljs (which I don't quite understand yet?), the toggling could be replaced with dynamic binding.

cljs.analyzer/-cljs-macros-loaded: Strictly an analyzer internal AFAICT, though I don't see any usage of the macros or fn that touch it?

cljs.analyzer/*constant-table*: A big one that needs to be carted around in any compiler "environment". It's a little odd to me that this is in the analyzer at all (more of an emission optimization, right, thus why it defaults false and only comes into play w/ :advanced?) actually, though I see a TODO talking about pushing out related logic. Also a recent addition, so the refactoring shouldn't cause any downstream breakage.

cljs.analyzer/constant-counter: An internal gensymming facility, can stay as is (and should probably be marked ^:private.

cljs.analyzer/namespaces: used throughout the compiler, as well as in the core REPL ns. Been around for a long time though, so if it stays put for a while, that'll be OK.

cljs.compiler/compiled-cljs: A cache of compilation results that looks to be important w.r.t. generating source maps? Seems like a straightforward refactoring to touch a private dynamic var.

cljs.compiler/ns-first-segments: looks like it's only used for controlling js name munging? Sharing this among different compilation processes seems harmless. Also been around for a while. Nice, this recent commit eliminated this atom, reimplementing on top of cljs.analyzer/namespaces.

cljs.closure/compiled-cljs: Another cache, this one of goog provides/requires; should be easy to lift into a compilation environment

The sanest way to move these bits of state into a compilation environment would probably be to bind a private dynamic var in each namespace in each of the entry points into the compiler/analyzer, so that existing signatures (e.g. compile-file) don't have to change.

All I mean by "compilation environment" is an atom containing a map. The compiler/analyzer can then swap! into it exactly as they're swap!-ing into top-level atoms, except each one will be using e.g. update-in to update a particular slot in the map. The most straightforward thing to do would be to dynamically bind this atom to private local vars in each part of the pipeline (analyzer, closure, compiler).

(With this in place, callers can deref to retain particular compiler/analyzer states, support introspection at a point-in-time, ship the contents of e.g. the :namespaces slot to editors for code completion, roll back to prior states to back out of attempting to load an ill-formed file/namespace, etc.)

Aside from advanced use cases, this puts the lifecycle of the compiler/analyzer state completely in each caller's hands, and should eliminate the possibility of any races. I'll have to take a closer look at the different REPL implementations (i.e. there's a REPL env protocol, but there's a lot of impl details that assume that all envs are actually records [or, types that implement either ILookup and/or Seqable]), but the compilation environment atom should just be a part of each REPL environment.

Chas Emerick
added a comment - 30/Oct/13 11:22 AM All I mean by "compilation environment" is an atom containing a map. The compiler/analyzer can then swap! into it exactly as they're swap!-ing into top-level atoms, except each one will be using e.g. update-in to update a particular slot in the map. The most straightforward thing to do would be to dynamically bind this atom to private local vars in each part of the pipeline (analyzer, closure, compiler).
(With this in place, callers can deref to retain particular compiler/analyzer states, support introspection at a point-in-time, ship the contents of e.g. the :namespaces slot to editors for code completion, roll back to prior states to back out of attempting to load an ill-formed file/namespace, etc.)
Aside from advanced use cases, this puts the lifecycle of the compiler/analyzer state completely in each caller's hands, and should eliminate the possibility of any races. I'll have to take a closer look at the different REPL implementations (i.e. there's a REPL env protocol, but there's a lot of impl details that assume that all envs are actually records [or, types that implement either ILookup and/or Seqable]), but the compilation environment atom should just be a part of each REPL environment.

The main thing here will be not break the intended use of namespaces or if we make it clear what the new mechanism will be. The namespaces atom was designed to support use cases served by vars in Clojure JVM.

David Nolen
added a comment - 30/Oct/13 11:38 AM The main thing here will be not break the intended use of namespaces or if we make it clear what the new mechanism will be. The namespaces atom was designed to support use cases served by vars in Clojure JVM.

Just continuing an IRC thread - no putting anything into &env. Clojure is ambient and ClojureScript is explicitly staged, via complier macros you can reach the previous stage and you should have an interface similar to what's provided ambiently in Clojure. What's available in &env should remain analogous to Clojure - information about the current compilation unit.

David Nolen
added a comment - 30/Oct/13 12:31 PM Just continuing an IRC thread - no putting anything into &env. Clojure is ambient and ClojureScript is explicitly staged, via complier macros you can reach the previous stage and you should have an interface similar to what's provided ambiently in Clojure. What's available in &env should remain analogous to Clojure - information about the current compilation unit.

Until I hear otherwise, I'm taking the liberty of changing/removing vars that have been introduced since 1847 (e.g. *track-constants* has been simplified away [since the analyzer really shouldn't care about whether constants are going to be emitted separately as an optimization]).

Chas Emerick
added a comment - 31/Oct/13 7:03 AM Going to be picking away at this over the next couple of days. People can follow along here:
https://github.com/cemerick/clojurescript/tree/643
(That'll get rebased whenever, so don't depend on any SHAs.)
Until I hear otherwise, I'm taking the liberty of changing/removing vars that have been introduced since 1847 (e.g. *track-constants* has been simplified away [since the analyzer really shouldn't care about whether constants are going to be emitted separately as an optimization]).

Note, I don't see how you can move *constants-table* (and thus *track-constants*) outside of the analyzer without putting multiple passes into the compiler. I do not recommend changing the current approach, unless of course you have some cool idea in mind

David Nolen
added a comment - 31/Oct/13 7:14 AM - edited Note, I don't see how you can move *constants-table* (and thus *track-constants*) outside of the analyzer without putting multiple passes into the compiler. I do not recommend changing the current approach, unless of course you have some cool idea in mind

The table isn't "outside of the analyzer"; it's still obviously responsible for populating it. The commit I've pushed so far just makes it so that the table is always populated, and it's just up to cljs.compiler/cljs.closure to emit it if they are so instructed via the opts.

Don't worry, I definitely don't want to be breaking any new ground here. Just pulling together all of the threads of state so the compilation process can be (more) idempotent (ignoring the massive details of the filesystem and macro runtime).

Chas Emerick
added a comment - 31/Oct/13 7:24 AM The table isn't "outside of the analyzer"; it's still obviously responsible for populating it. The commit I've pushed so far just makes it so that the table is always populated, and it's just up to cljs.compiler/cljs.closure to emit it if they are so instructed via the opts.
Don't worry, I definitely don't want to be breaking any new ground here. Just pulling together all of the threads of state so the compilation process can be (more) idempotent (ignoring the massive details of the filesystem and macro runtime).

Added a draft patch, which can also be viewed on my 643 github branch.

The commit message details the changes made (though I'll probably reorganize it for clarity). Essentially, all "compiler" (analyzer, compiler, closure) state has been lifted up into "compiler environment" atoms/maps held in private *compiler-env* vars in each namespace (there isn't one because there are multiple entry points in all three namespaces, and it seems bizarre/wrong for the canonical "compiler environment" to be in the analyzer).

This passes all tests, and the command-line REPL works as expected.

Known TODOs at this point include:

ensuring all entry points set up their respective *compiler-env* bindings properly (only the REPL and closure/build do so right now)

making sure that this change can be accommodated by downstream tools, e.g. cljsbuild, austin, cljs-noderepl, etc

Hoping to get a patch review at this point to make sure there aren't any fundamental problems before continuing on with the above.

Chas Emerick
added a comment - 02/Nov/13 8:08 AM Added a draft patch, which can also be viewed on my 643 github branch.
The commit message details the changes made (though I'll probably reorganize it for clarity). Essentially, all "compiler" (analyzer, compiler, closure) state has been lifted up into "compiler environment" atoms/maps held in private *compiler-env* vars in each namespace (there isn't one because there are multiple entry points in all three namespaces, and it seems bizarre/wrong for the canonical "compiler environment" to be in the analyzer).
This passes all tests, and the command-line REPL works as expected.
Known TODOs at this point include:

ensuring all entry points set up their respective *compiler-env* bindings properly (only the REPL and closure/build do so right now)

making sure that this change can be accommodated by downstream tools, e.g. cljsbuild, austin, cljs-noderepl, etc

Hoping to get a patch review at this point to make sure there aren't any fundamental problems before continuing on with the above.

Addendum: I'd personally like to introduce a new namespace where a single *compiler-env* var lives, and eliminate the other three vars. Let me know if that sounds sane. Also looking for better names for *compiler-env*, and perhaps this new namespace.

Chas Emerick
added a comment - 02/Nov/13 8:14 AM Addendum: I'd personally like to introduce a new namespace where a single *compiler-env* var lives, and eliminate the other three vars. Let me know if that sounds sane. Also looking for better names for *compiler-env*, and perhaps this new namespace.

Overall the patch looks good. I think *compiler-env* is fine, and I agree that a creating a namespace for it is best. I don't think we need to spend too much time on the namespace name as it's not intended to be used directly. Let's please drop the ^:private stuff, it's ugly and unnecessary, and needlessly Clojure JVM specific (think bootstrapping a ClojureScript in ClojureScript without reified Vars). Just doc *compiler-env* as private.

Great stuff, a huge improvement and a big step forward. Thanks a million Chas!

David Nolen
added a comment - 02/Nov/13 10:48 AM - edited Overall the patch looks good. I think *compiler-env* is fine, and I agree that a creating a namespace for it is best. I don't think we need to spend too much time on the namespace name as it's not intended to be used directly. Let's please drop the ^:private stuff, it's ugly and unnecessary, and needlessly Clojure JVM specific (think bootstrapping a ClojureScript in ClojureScript without reified Vars). Just doc *compiler-env* as private.
Great stuff, a huge improvement and a big step forward. Thanks a million Chas!

Chas Emerick
added a comment - 02/Nov/13 8:46 PM Hrm, I could have sworn that cljs warned on private symbols at some point? Anyway, I'll pull the private meta.
Thanks for taking a look. I'll get on the downstream testing, hopefully have something to be applied before Monday morning.

New draft2 patch that introduces a cljs.env namespace, eliminating the three separate *compiler-env* vars.

lein-cljsbuild can be used as-is with this change, though e.g. cljsbuild auto won't benefit from the fast incremental compiles (each compilation just starts with a fresh compiler env). I've already made the change to cljsbuild locally to use the same compiler-env for each :build, yielding the incremental speedup.

REPL support is turning out to be trickier. The command-line REPL is fine, as is node-cljsrepl, but the browser REPL (and austin) are more complicated w.r.t. the dynamic state of cljs.env/*compiler*, with their preload mechanisms, putting initial script compilation in futures, etc. Working on it.

Chas Emerick
added a comment - 03/Nov/13 8:00 PM New draft2 patch that introduces a cljs.env namespace, eliminating the three separate *compiler-env* vars.
lein-cljsbuild can be used as-is with this change, though e.g. cljsbuild auto won't benefit from the fast incremental compiles (each compilation just starts with a fresh compiler env). I've already made the change to cljsbuild locally to use the same compiler-env for each :build, yielding the incremental speedup.
REPL support is turning out to be trickier. The command-line REPL is fine, as is node-cljsrepl, but the browser REPL (and austin) are more complicated w.r.t. the dynamic state of cljs.env/*compiler*, with their preload mechanisms, putting initial script compilation in futures, etc. Working on it.

New draft3 patch attached. It addresses all known impact of introducing the compiler environment concept on REPLs:

command-line Rhino works as expected

browser-REPL has been modified as necessary, now creates one new compiler environment per browser-REPL environment due to its multithreaded precompilation, etc

cljs-noderepl works without modification, which is nice

Piggieback and Austin need to be modified in various ways, the latter similar to the changes the stock browser-REPL required. Piggieback does a bunch of stuff differently than cljs.repl/repl, so its changes are unique, but not significant. I've already completed and tested these locally, and will push them out in conjunction with a release that contains this patch.

I'm relatively comfortable with the patch at this point. Also, I really, really don't want to rebase it over conflicting changes again.

Chas Emerick
added a comment - 05/Nov/13 9:36 AM New draft3 patch attached. It addresses all known impact of introducing the compiler environment concept on REPLs:

command-line Rhino works as expected

browser-REPL has been modified as necessary, now creates one new compiler environment per browser-REPL environment due to its multithreaded precompilation, etc

cljs-noderepl works without modification, which is nice

Piggieback and Austin need to be modified in various ways, the latter similar to the changes the stock browser-REPL required. Piggieback does a bunch of stuff differently than cljs.repl/repl, so its changes are unique, but not significant. I've already completed and tested these locally, and will push them out in conjunction with a release that contains this patch.

I'm relatively comfortable with the patch at this point. Also, I really, really don't want to rebase it over conflicting changes again.