Thursday, December 15, 2011

Experimental support for Common JS and AMD/require.js modules in Closure Compiler

I recently stumbled over a thread on the jQuery mailing list about how to modularize jQuery which keeps getting bigger and bigger with every version with not everybody using every feature. Some argued to change jQuery to support "dead code elimination" via Google Closure Compiler's advanced optimizations, which would eliminate unused code from people's projects; others wanted to use AMD/require.js modules instead, which enables only loading required dependencies.
Having just done a little project on closure compiler at work, I figured it might be possible to support both of those ideas equally. And so I got coding…

How it works

With my change closure compiler (CC) gets experimental 1st class support for both Common JS and AMD modules. This means that CC knows about these types of modules and performs special transformations and optimizations for them.
The high level goals are:

Concatenate all modules into a single large file.

Automatically order modules so that dependencies are fulfilled.

Make it really easy for CC to apply its built in optimizations.

Step 1: Transform AMD to Common JS modules

Add --transform_amd_modules to the command line options of CC to transform AMD modules to Common JS modules. In this first steps basically

define(['foo'], function(foo) { return {
test: function() {}
}});

gets transformed to:

var foo = require('foo');
module.exports = { test: function() {} }

From now on we don't have to worry about the peculiarities of asynchronous AMD anymore. This step by itself, might be useful to some people. E.g. if you want to use AMD code in Node.js directly.

Step 2: Process Common JS modules

Add --process_common_js_modules to the command line options of CC to enable specific processing of Common JS modules.
Most Common JS implementations (like e.g. Node.js) implement it by wrapping all code of a module in a closure like this:

(function(require, exports, modules) { /* your module code */ })(…)

The problem with that is that the module pattern is really hard to optimize for closure compiler because with function calls and scopes involved, everything becomes really dynamic and hard to statically understand.
This is why I implemented a transformation for Common JS modules which allows them to be safely concatenated into a single JS file without the use of closures. This works by renaming all global symbols in a module so that they never conflict with a different module.
The following Common JS module named "example/baz":

Notice how exports just becomes module$example$baz while require('foo') gets turned into module$foo. As you see both exports (and by proxy module.exports) as well as require get converted into direct references to the specific module. All global variables and function names get suffixed with the module name, so that they can no longer conflict with any other module.
Note that while these sources seem really verbose, closure compiler will, of course, make all variable names really short later in the compilation process.

Step 3: Managing dependencies

Add --common_js_entry_module=foo/bar.js to your command line options to specify your "base" or "main" module. Going from this, the system will figure out the dependencies and only include those in the final output. Also everything will be in the right order.

How to use it

On Performance

I'd argue that if you need to load some JS, doing it in a single requests usually always wins. Having *ALL* your JS in one file is, however, usually not a good idea. You want to incrementally load stuff. How to do that within the framework of what I described above is left as an exercise to the reader.