Saturday, November 19, 2011

Macroassembler ahoy!: more JIT in 9

I'm typing this post in a modified version of TenFourFox 8 with regular expression compilation baked into the JavaScript JIT, and I'm already sad about having to go back to vanilla 8 for comparison testing because it's already making a huge difference for the sites I use. Regexes, for the uninitiated, are concise ways of expressing patterns to match against and/or extract portions from data. They obviously predate JavaScript by many years, of course (probably the language that is most associated with them today is Perl), but they are a common part of modern JS applications and as such their performance is tested by most benchmarks. Unfortunately, the tracer-based regex JIT that Mozilla wrote for Firefox 3.6 was removed in Firefox 4.0 for YARR (Yet Another Regex Runtime, from WebKit) and its own regex JIT, so we never got the advantage of JavaScript regular expression compilation and every prior version of TenFourFox fell back on the interpreter when a regex was encountered.

Well, no more. As part of our efforts to attack methodjit (more on this soon), we finished porting the Nitro macroassembler to PowerPC based on Ben's initial hard work and this enables us to use YARR. Better yet, we can use YARR JIT and still use our heavily optimized custom tracejit right now. So it's going to be in 9. This also eliminates our dependence on PCRE to maintain our regular expression performance because we can now use the same code as everyone else.

How much difference does that make? Well, it depends on how much that's in your workload, but on the quad G5 I develop on here at Floodgap Orbiting Headquarters, SunSpider drops from 1600ms to ... are you ready? ... 990ms. Yes, kids, we're already at our target of getting under a second on SunSpider, and we haven't even implemented methodjit yet!

However, it should be noted that this is because the part of SunSpider we consistently chugged on was the regexp portion, which was almost 650ms before, and is now about 45ms. Likewise, on V8, we improve from 627 to 769 purely on the basis of RegExp; on Dromaeo, which is a fairly balanced benchmark, we make a much more modest improvement from 110 runs/sec to about 119 runs/sec.

Thus on many sites you will see little difference, but virtually all sites with significant JavaScript requirements use some regexes and you will see some improvement on them, and some sites like Twitter use gargantuan expressions which now parse considerably faster (Twitter's fat regex size was, in fact, what caused TenFourFox 5 to break). It definitely improves browser chrome performance because large numbers of regular expressions are used by the browser's JavaScript code, and these regexes can now be cached ready to go in machine code. The downside is, like all JITs, they only pay off if they are cached and so there will be some additional memory demands (offset by no longer requiring us to cache PCRE results). So it's a net win, and G3 owners will be delighted to hear that this is not limited to AltiVec -- regular expression compilation will be enabled on all versions, including G3. And the good news for builders is that we were able to hack it to correctly compile on gcc 4.0.1, so no compiler change is currently required.

The news is not as good with methodjit -- we can't even get it to compile simple code, let alone run it, even though we know that our macroassembler works (because YARR JIT works). Mozilla is suspicious that we may have unearthed a bug with register allocation and I have some contacts who are hopefully able to give us some tips to debugging the problem. I'm still hopeful for the Fx10 timeframe, which is good, because it seems there are some regressions with Fx9's Type Inference (which does not affect the tracing JIT) and it would be better to have those shaken out before we try to implement that for PowerPC.

Otherwise, the Fx9 port is so far uneventful -- I'm about halfway through the patches and so far there have been no major issues, although we haven't tried to build it yet. There is a lot scheduled for this beta, including not only YARR JIT but also faster AltiVec text processing and AltiVec colour space management (encores from Tobias), so it's taking a little longer than intended. I'm shooting for Thanksgiving weekend, and we can all give thanks for that. :)