Thursday, September 4, 2008

Playing around with the benchmark some more, I noticed some strange things. The more I clicked the "Bench" button, the faster Chrome got. After about 4 runs of the benchmark, it increased as much as 2x in performance. Is Chrome using HotSpot-like techniques to count method invocations for inlining?

Secondly, the animation on WebKit/Chrome gets choppy. This isn't because the Javascript isn't drawing the frames. The benchmark draws 100 frames, each in a setTimeout call to avoid slow script warnings. Browsers appear to defer reflecting any mutations to the UI until the Javascript thread yields the CPU back. That is, you don't see the results of a canvas fill() or stroke() until the CPU is yielded. (Opera seems to work differently, but offers a GameCanvas which lets you lock/unlock progressive updates)

Now, I would normally expect that the draw results would be copied/blitted/reflected into the Browser's document as soon as the Javascript thread yields. But what if the update is scheduled in the same timer/event queue as other pending setTimeout calls? And what if the next frame's scheduled timeout is before the native canvas update?

Then what would happen is that the Javascript code would get the CPU again, erase the canvas, and draw the next frame. This would account for the choppiness and disappearance of intermediate frames.

But it could also count for the performance differential between TraceMonkey/FF3.1 and WebKit/Chrome, in the sense that TraceMonkey might "pay the cost" of reflecting every frame into the browser's window which gets counted by my timing code, whereas WebKit/Chrome might get a free ride by skipping the update and somehow avoid the cost by deferring it.

I'm not sure, and it would be interesting if people who are familiar with the internals of Gecko and WebKit could comment on how canvas updates are handled.

In the meantime, I'll look at ways of ensuring that every frame is drawn, perhaps by increasing the intra-frame timeout interval.

Hey guys, in order to assist in the browser wars, and to let everyone test browser performance, I've built a publicly hosted version of the benchmark I used to test Google Chrome vs Firefox 3.1 and Safari 4.

As an errata for my previously published benchmarks, I did find that somehow my WebKit Nightly batch file wasn't invoking Safari correctly on my Win32 box (it was complaining about a file not found which I didn't see in the console), so my previous results were NOT with SquirrelFish. Upon rerunning with SquirrelFish, I find it actually does hold its own vs Chrome.

If you're one of those poor folks who are running IE, don't try to run the benchmark. :) It might run, but chances are, you'll get a slow script warning. Also, it uses Flash for rendering instead of Canvas, so it's not an Apples-to-Apples comparison.

Tuesday, September 2, 2008

The current browser wars make me feel like I'm watching the Fast and the Furious, and Google's new Chrome browser is like Nitrous Oxide for the web. As soon as I heard that ex-HotSpot guys were working on it, I knew it could be good, as they did a marvelous job with Self (a fully dynamic language), as well as Java, but how good?

To test, I decided to use my own codebase, Chronoscope, for several reasons. First of all, it is not a microbenchmark, it exercises a big swath of the browser's code path. Secondly, it is computationally intensive, so if anything were going to show speed on a "real world" app, this would be it. Finally, I need visualizations to run faster, as I want to use Open Web technologies like Canvas or SVG rather than relying on Flash.

Chronoscope is written in GWT, and to some extent, the GWT compiler may negate some of Chrome's V8 technology in the sense that GWT "de-classes" many OO polymorphic dispatches into a more functional style of programming, removing as much dynamic dispatch as possible, and eliminating prototype lookups and function call overhead through inlining. I don't know if GWT hurts "hidden classes" or not, but it might be possible that if GWT didn't provide such optimizations, the performance differential might be larger.

Despite this, the results are still good. The test consisted of calling the chart's redraw() function 100 times per trial, with 10 trials. The slowest and fastest trial are thrown out, and the mean and standard deviation are calculated on the remaining data.

I tested on a Mac Pro 2.66Ghz with 6Gb of memory, OSX 1.5. The tests were conducted within a Parallels VM running XP2 Service Pack 2, given 2 CPUs and 2Gb of memory. For each browser, I rebooted the VM from a clean start, and ran only the test browser.

The tests were conducted with the lastest version of Safari 3.1 for Windows (run via WebKit nightly batch file), the Firefox 3.1 nightly with TraceMonkey enabled, and the Chrome beta download.

Browser

Mean

Standard Deviation

Memory

Firefox 3.1 (TraceMonkey)

3647ms

81ms

49M

Safari 3.1 (SquirrelFish)

3005ms

385ms

106M

Chrome

1690ms

190ms

44M

Chrome looks to be twice as fast as the competition on real world apps. Moreover, it's performance per Mb of memory is good as well. Even more interesting was the variation in Firefox 3.1, which appears to be related to garbage collection or memory allocation. In some runs, it was very fast, but in others, it was 50% slower.

As soon as Chrome is released on OSX, it will replace Safari for me, for two reasons. The first, the process isolation and speed features. Secondly, Chrome includes a very nice Firebug-like debugger/inspector that I like better than the Safari equivalent.

So in summary, Chrome rocks! All they were missing from the launch was Vin Diesel.