Making JavaScript Fast, Part 2

David Chisnall continues his discussion of some of the techniques a modern JavaScript implementation uses for speed. In the second part of the series, he considers some ways of optimizing compiled JavaScript code.

Like this article? We recommend

Part 1 of this series showed how a basic JavaScript implementation could work, and some of the techniques for making it faster. In this article, we'll take a look at some advanced techniques related to code generation.

Traces and JITs

The next step up from interpreted bytecode is just-in-time (JIT) compilation, in which code is compiledoften from bytecodeto native code just before it's needed. A lot of virtual machines that claim to do JIT compilation really do load-time compiling; they compile an entire program when it's run. A real JIT compiler (often just called a JIT) will compile each code fragmenttypically each function or methodas the first time it's called. By compiling the code a little bit at a time, it spreads the cost of compilation over the first run of the program, rather than concentrating it at the start. A lot of JIT compilers will cache the generated code for the next time the program runs. This is much less common with JavaScript in a browser, where programs often are run once and then discarded.

This is a big advantage for web scripts, where a "program" is typically compiled every time it's run, and you don't want to have to wait for the whole thing to compile before you can run any of it. Another option is background-compiling, in which the code is compiled in one thread and interpreted in another thread. As the compiler finishes each function, it updates the interpreter to call the compiled version instead of interpreting it.

You can even combine these two approaches by keeping the functions to be compiled in an ordered list and moving them closer to the head every time they're run in the interpreter. This technique will result in the frequently called functions (the ones that will have the biggest impact on overall performance) being compiled first. If a function is never called, no one cares if it isn't compiled. Some virtual machines will have a threshold of a small number of calls (for example, 16) before a function is compiled. This design lets you avoid wasting time compiling code that runs at the start and then is never run again; instead, it will run in the interpreter the first time, and you don't gain anything from compiling it after that.

So far, I've talked about compiling functions or methods. These are useful blocks of code for the programmer to ponder, but they don't often correspond directly to the flow of execution through a real program. It's very unusual to have a program that runs one function, another, another, and so on. Most programs do a bit of work in one function, call another, and do a bit of work there. The second function may call a third, do a bit of work there, and then have it return. The program then spends a bit of time in the second function before returning to the first. If you never had to maintain this code, you could create a much more efficient version of it by rearranging the boundaries between functions. It would be completely unreadable, but it would be faster.

This is the idea behind tracing VMs, such as the Tamarin JavaScript implementation that originates with Adobe Flash. Flash has an advantage over most JavaScript applications: The code for the ActionScript (a dialect of JavaScript) distributed with a Flash program is already parsed and is shipped as bytecode. The plug-in can begin interpreting it immediately. Then, rather than compile the functions, the virtual machine runs the code in the emulator and produces traces (blocks of code with a single entry point and potentially more than one exit point), which are then turned into single blocks of machine code.

This technique can be very effective. Often, functions contain several code paths, but only one is commonly used. A tracing VM can join together the commonly used paths through a short series of functions, creating a single non-branching block of executable code. While you're using this "hot path," the code will run much faster than it would have with the original structure.