“Hello, Internet!”

SquirrelFish is fast—much faster than WebKit’s previous interpreter. Check out the numbers. On the SunSpider JavaScript benchmark, SquirrelFish is 1.6 times faster than WebKit’s previous interpreter.

SunSpider runs per minute

Longer bars are better.

What Is SquirrelFish?

SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding register window calling convention. It lazily generates bytecodes from a syntax tree, using a simple one-pass compiler with built-in copy propagation.

SquirrelFish owes a lot of its design to some of the latest research in the field of efficient virtual machines, including research done by Professor M. Anton Ertl, et al, Professor David Gregg, et al, and the developers of the Lua programming language.

The Implementation of Lua 5.0(Outlines the implementation of a real-world register-based bytecode engine, with a sliding register window calling convention)

I’ve also pored over stacks of terrible books and papers on these topics. I’ll spare you those.

Why It’s Fast

Like the interpreters for many scripting languages, WebKit’s previous JavaScript interpreter was a simple syntax tree walker. To execute a program, it would first parse the program into a tree of statements and expressions. For example, the expression “x + y” might parse to

+
/ \
x y

Having created a syntax tree, the interpreter would recursively visit the nodes in the tree, performing their operations and propagating execution state. This execution model incurred a few types of run-time cost.

First, a syntax tree describes a program’s grammatical structure, not the operations needed to execute it. Therefore, during execution, the interpreter would repeatedly visit nodes that did no useful work. For example, for the block “{ x++; }”, the interpreter would first visit the block node “{…}”, which did nothing, and then visit its first child, the increment node “x++”, which incremented x.

Second, even nodes that did useful work were expensive to visit. Each visit required a virtual function call and return, which meant a couple of indirect memory reads to retrieve the function being called, and two indirect branches—one for the call, and one for the return. On modern hardware, “indirect” is a synonym for “slow”, since indirection tends to defeat caching and branch prediction.

Third, to propagate execution state between nodes, the interpreter had to pass around a bunch of data. For example, when processing a subtree involving a local variable, the interpreter would copy the variable’s value between all the nodes in the subtree. So, starting at the “x” part of the expression “f((x) + 1)”, a variable node “x” would return x to a parentheses node “(x)”, which would return x to a plus node “(x) + 1”. Then, the plus node would return (x) + 1 to an argument list node “((x) + 1)”, which would copy that value into an argument list object, which, in turn, it would pass to the function node for f. Sheesh!

In our first rounds of optimization, we squeezed out as much performance as we could without changing this underlying architecture. Doing so allowed us to regression test each optimization we wrote. It also set a very high bar for any replacement technology. Finally, having realized the full potential of the syntax tree architecture, we switched to bytecode.

SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes.

The bytecode’s register representation and calling convention work together to produce other speedups, as well. For example, jumping to the first instruction in a JavaScript function, which used to require two C++ function calls, one of them virtual, now requires just a single bytecode dispatch. At the same time, the bytecode compiler, which knows how to strip away many forms of intermediate copying, can often arrange to pass arguments to a JavaScript function without any copying.

Just the Beginning

In a typical compiler, conversion to bytecode is just a means to an end, not an end in itself. The purpose of the conversion is to “lower” an abstract tree of grammatical constructs to a concrete vector of execution primitives, the latter form being more amenable to well-known optimization techniques.

Therefore, though we’re very happy with SquirrelFish’s current performance, we also believe that it’s just the beginning. Some of the compile-time optimizations we’re looking at, now that we have a bytecode representation, include:

constant folding

more aggressive copy propagation

type inference—both exact and speculative

specialization based on expression context—especially void and boolean context

peephole optimization

escape analysis

This is an interesting problem space. Since many scripts on the web are executed once and then thrown away, we need to invent versions of these optimizations that are simple and efficient. Moreover, since JavaScript is such a dynamic language, we also need to invent versions of these optimizations that are resilient in the context of an unknown environment.

We’re also looking at further optimizing the virtual machine, including:

Extra Bonus Updates

We’ve got some extra bonus info: very early draft documentation of the SquirrelFish VM’s opcodes. For those of you who know about VMs, you may find this enlightening, for those who don’t, you may find it is simpler than you expect.

In addition, we have a detailed comparison of Safari 3.1 vs. SquirrelFish, looking at the individual tests, it is interesting to see which sped up the most. If you look at this comparison to Safari 3.0, you can see that we’ve sped up 4.34x overall since Safari 3, and have improved some kinds of code by over an order of magnitude.

Ever wondered if it makes any sense and at least is possible to cache this byte compiled code instead of caching the original source file. Ever done any experiments in this field? Applications built with huge application JavaScript frameworks (like qooxdoo) may have multiple large JS files with more than 1MB size (un-gezipped). Storing a “byte compiled” version in the cache may make sense for files of that size.

@wpbasti: An issue with that is that we optimise lookup of global values, which may not be valid if the load order of such files is different. That said it is possible that in the future we may be able to appropriately update such references. Another thing to consider is of course the fact that we don’t actually compile functions until they’re called, and even then the time to compile any given function is typically tiny compared to the time required to execute it.

It’s great to finally see a post about SquirrelFish on the WebKit blog. I made a short post to kick off a new blog that I will hopefully use to talk about ongoing JavaScriptCore development. The first post includes some SunSpider numbers for the bleeding edge versions of different browsers, which may be of interest to people reading this post.

@iFrodo: it should be there, have you checked the context menu, or the Develop menu? Or are you referring to Drosera? I ask because Drosera was recently killed off as we have now integrated the debugger with the web inspector.

Since I figure an incorrectly-submitted bug report beats none: In Safari 3.1.1 (but not Firefox 3 RC 1), trying to set the body of my GMail vacation message to “I’ll be away” makes it save as “I’ll b away”, reproducibly. If I try to set the away message “‘ABCDEFGHIJ” (note the leading quote), the “A” is dropped. (My vacation subject is “I’m away until June 17″ if that’s necessary to repro.)

Hope this is really a WebKit bug and I’m not being silly. It’s a great product.

[...] SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes. – The webkit blog [...]

I must confess I’m very happy about the commotion WebKit in general and SquirrelFish in particular are causing in the Javascript engine realm. Seemingly, new wine will again not go into old wineskins, and it needed a fairly fresh endeavor to bring Javascript closer to a level it deserves. In contrast, the Mozilla project seems to suffer from a certain stiffness in that regard, despite of Tamarin and all that, and the amount of love Firefox’s engine receives leaves me disappointed. Which is even more surprising since all of the browser’s chrome runs on top of it. All experiences from similar runtime environments (e.g. Emacs, Eclipse, basic operating systems,…) seem to be ignored and have to be gathered again. When will they start running multiple interpreter instances (or at least worker threads) in the browser, to isolate chrome and different pages from each other?! Will WebKit do it? – Anyway, way to go, WebKit!

But SunSpider Benchmark only report the interpret time. For JavaScript, most of the scripts are compiled on-the-fly, so the compilation time is also important. It will be more convincing
if considering both interpret time and compile time on SunSpider Benchmark. Although, considering code cache, speedup interpreter is more important.

@Mark Rower and @Oliver: When I traced the SunSpider test cases on SpiderMonkey before, it seems to me when it executes the first line to record the time, the script is already compiled. I thought this is same for Squirrelfish. Sorry for my confusion.Thanks.

This is really impressive! I’ve seen some comparisons to Tamarind around the web and my own testing with apps seems both solid and fast.

I’ve attempted to use squirrelfish with the latest public beta with a number of existing javascript benchmarks and I’m seeing a number of tests with results of 0 which, when I repeat the tests, either stay at zero or go to 16 ms. Is there a lower limit on measurability using the timing functions?

Are there known functional differences between the prior engine (JavaScriptCore? is that what it was called?) and SquirrelFish? Even things that used to be broken that you fixed.

Are there areas where we should expect dramatic speed increases that should change how JS developers design code? There are certainly costly choices that we now avoid.

Oh, and I hope that we’ll see this on the iPhone. That’s a device whose javascript performance could use a speedup.

[...] SquirrelFish – So awesome. Those webkit guys just make my day every frickin time. Too lazy to click the link? SquirrelFish is a new superfast JS vm runtime. Benchmarks show it faster than Tamarin at the moment even. Not much need for explanation here. The better performance runtimes we get for the open web, the better it can compete against proprietary competition! Ok, I guess that’s enough for now. I really don’t want to turn this into a news aggregation blog, regurgitating things that I think are cool. You can just go to Ajaxian to see where I get MY news from. However, news regurgitation is easy, and I needed to write something. Also, I feel like such a negative nancy sometimes and I thought a positive post would be nice for a change. [...]

(off topic)
Hello the fast fish sounds great. But I dont need more speed. I need a safari that can relax.

Right now Safari uses 85% cpu and the only thing I do with the browser is writing this lines. Is it flash that hangs from a previous page? Or is it buggs in Safari? There is no flash as I can see. I have done a prosess sample if anyone is interested.

I am an editor/writer and dont know so much about programming, but I do know that people dont like pages that starts up fans like an old DC-3. I understand a programmer that wants the application to have as much power as possible, but the overall experience is to hot. I have talked with colleagues and it is a problem that a page triggers heat and fans. We fear its a turn-off among the readers. One option we discussed is no-flash on the frontpage. Right now i am on powerbook, but i guess the thermomanagment is similar on other laptops.

So if Safari and/or flash and other apps can do things slower, but cooler, I will support that.

@eflaten: “Higher speed” when it comes to computers means it uses the CPU less. It is the same thing, so yes you _do_ need higher speed

As for flash: if the flash player is doing something in an inefficient way that is unfortunately outside the scope of what the WebCore team does. If you can indeed show that flash is to blame then you probably need to complain to Adobe.

[...] fastest content rendering around as well as nippy JavaScript execution with the state of the art SquirrelFish VM. The JavaScript SDK is available independently of the web renderer for sandboxed client-side game [...]

There is a bootleg store in China called “Squirrel–shaped Fish”. In China it’s standard procedure for stores to illegally use names and logos from large international brands. With this store, they stole the Lacoste alligator logo, but made their own name. Pretty genius. I don’t know if this was the inspiration for the name squirrel fish, but it should be.

[...] about SproutCore on its official website. Apple also has more details on the new MobileMe, and SquirrelFish details are on the Webkit project site. Possibly related posts: (automatically generated)Adobe [...]

[...] There’s already a developer seed of Safari 4 released. Which includes the SquirrelFish JavaScript interpreter (renamed from GlassFish to avoid confusion with Apple’s other Java stuff). SquirrelFish is a [...]

[...] you’ve seen it is not at all slow, I guess with Firefox 3 and newer versions of Safari / Webkit it should get even faster. The point behind this is that if the foundation stands as-is, its just a [...]

Hey Apple team, I beg you to develop an updater for Safari for Windows that just updates the changed bits i.e. patches the existing install instead of downloading the whole thing again, uninstalling and reinstalling. As a user, it’s one thing keeping me away from consistently using Safari because when a vulnerability is detected, I can’t continue to use the old version and I am unable to download large files everytime.

dicklacara: probably not. The compilation time is pretty minimal, and that would make it so that the bytecode format couldn’t be upgraded in the future, which would make further performance gains harder.

The secret behind isn’t LNVM, it’s just Forth. It is the know how of this some people well known programming languge. If you hear about M. Anton Ertl and David Gregg, and about a very fast direct-threaded interpreted (may be byte code or not), it is clear,. it is Forth-know-how. I’m absolutly sure.

Some Forth systems are the fastest threaded code and even direct-threaded code interpreter (als virtual stack machines) available. Know how coming from here speeds up SquirrelFish.