Wednesday, November 7, 2012

Falling to baseline? also: wither [sic] ESR 24?

Now that IonMonkey is part of Firefox 18 as the compiler for "long running" JavaScript, Mozilla is looking to try to replace JaegerMonkey, which is still the baseline compiler (we have a full implementation of JM with trace inference in TenFourFox, which long-time readers of the blog will know as "JM+TI"). This is not good news.

To recap history for newer readers, TenFourFox has implemented two JavaScript compiler backends for PowerPC. JavaScript is not an easy thing to compile. Our dearly departed tracejit (TraceMonkey, in 4-9) was a fast compiler because it wasn't really compiling; it was in simple terms just noting what operations got done by the interpreter and playing them back. JaegerMonkey actually does compile JavaScript methods, but it does so by compiling to assembly language elemental opcodes called JSOPs, which act as sort of an intermediate bytecode representation of JavaScript (and in interpreter mode, JSOPs are what the interpreter actually runs). The current implementation pairs JaegerMonkey with type inference, where the compiler tries to guess what actual data types are in play in a script (integers, floating point, etc.), and generate code more specialized to those "inferred" types. This has more latency than TraceMonkey, which low-end users complained about in our early implementations of PowerPC JM+TI, but the outcome is dramatically faster runtime overall and 17 has made JM+TI even faster than it used to be.

Still, this doesn't facilitate certain kinds of optimizations that, for example, your typical C compiler can perform; all JM+TI is essentially doing is computing stack depths on the first pass, and then plopping out little packages of assembler code for each operation on the second. There is no attempt, or indeed no easy way, to do much analysis of the generated code or the internalized source representation in this scheme because each JSOP is treated as an atom. So IonMonkey was written as a more traditional compiler to implement these optimizations (using more advanced intermediate representations), but optimizations take time, and the added overhead of IonMonkey does not pay off until you run that highly optimized code a lot. Thus, JaegerMonkey remains the baseline compiler because it is less expensive, and at least for the next few iterations of the unstable branch, will still be our only compiler. Only for major, more intensive applications is IM employed in Firefox 18 and 19. (IM is implemented for ARM and x86 only; there is no SPARC or MIPS version yet, or indeed any big-endian backend.)

(As an aside, IonMonkey is also more like a traditional C compiler in that it uses the processor stack directly instead of the internal JavaScript stack, which is independent. Running out of stack space in a 32-bit environment has been a big issue for us in the past, and we're still not entirely ABI compliant with our stack frames even though I revised this significantly for 17. Part of the reason for TenFourFox's higher memory demands, besides cached fragments of code, is that our stack is compiled to be very large to insulate us from crashes and swapping the stack in and out of memory is a performance killer on RAM-impaired systems. Besides getting JM+TI and IM to play nice together, I am also concerned that a complex and/or recursive IM code fragment could easily run off the end of the 1GB stack that already exists, and we might not be able to squeeze much more out of the addressing range we are limited to.)

To once again move from a hybrid compiler to a "grand unified approach," Mozilla needs to make IM less expensive if they want to use the same code as part of a "cheaper" compiler. Already, combined JM-IM took about a 3-5% haircut in SunSpider, for example, and worse on slower machines where IM's latency becomes a bigger proportion of runtime. This is the idea behind the Baseline Compiler: a profile, if you will, of IonMonkey that cuts back some of the more advanced and computationally complex portions of IM to generate "good enough" code. Google V8 Crankshaft already implements an analogous idea (here we are trying to "out-Chrome Chrome" again, as usual) for small scripts, and TraceMonkey served this purpose in a limited and not directly intentional sense when it existed. The baseline compiler, meta-tracked in bug 805241, really does aim to be as minimal as possible; besides implementing none of the more advanced optimizations of IonMonkey, it doesn't even implement type inference to avoid any risk of recompiling the script if type inference turns out to make incorrect assumptions. However, it is almost guaranteed it will use the IonMonkey backend to generate its code instead of the JaegerMonkey one.

This is bad for us because IonMonkey is much more complex to implement than JaegerMonkey was, and JaegerMonkey was already a huge effort between Ben and myself. We put a lot of time into optimizing it well, including Ben's work on special separated G5 and G3/G4 code paths, tightened pieces of code that self-optimize, and decreased compiler overhead. Assuming that the Baseline Compiler replaces JM+TI completely, we will basically be starting from scratch for the second time in as many years, and I don't see a SPARC or MIPS implementation yet that we can crib from. (At least we have ARM as a basis for our own RISCy implementation, but the ARM implementation is little-endian.) This would seriously drag down progress on this port and would count as a trigger to drop source parity if we couldn't get it working; losing tracejit was originally a show-stopper too until we got methodjit off the ground. I'm loathe to put significant time into it while the Baseline Compiler is off in the future because IM in its current implementation is likely to be a lot of work for little gain (and potentially some regression on low end G3 and G4 systems), but when the Baseline Compiler does emerge we can expect JaegerMonkey to be completely excised within a couple of releases just as TraceMonkey was after type inference landed, so I don't know what I'm going to do with this yet. We have the advantage of having learned from our experience developing PowerPC methodjit, but this is a much bigger task.

The situation is made a little more acute because Mozilla has said very little about whether there will be an ESR 24 after ESR 17. Ars Technica's latest browser survey shows that ESR 10 has not been as widely embraced as the howling over version numbers would suggest; it represents just 0.47% of all web users, and that undoubtedly includes our stable branch users. Mozilla committed to ESR 17 originally and they are obviously keeping their promise, but they have said nothing about ESR 24, and numbers like these will undoubtedly be weapons within Mountain View for the ESR's opponents to kill it off after the promised support period for 17 expires. If that turns out to be the case, there may be no stable branch for us to upgrade to, if we live that long.

So, with that cheery thought, Chris found an explanation for the minority of users who complained they could not download .pdfs or certain other files they had formerly viewed with plugins; it looks like our code to disable them is incomplete, and there is already a fix in issue 188. This changes some internal semantics, so I will not implement it in 10.x. I also took the liberty of tweaking our internal locale a bit for the QTE, and implemented bug 752376 for a little more snappy in the tab bar. Mozilla right now is trying to determine what they will do with Click-to-play (for plugins) in Firefox 17, which right now is buggy on certain sites, but this is irrelevant to us since we don't ship with plugins enabled anyway. Their plan is to have a release candidate ready somewhere around the 14th, and so will we; I will also build our last 10.x version around the same time, which will have a fix for issue 130, and finally terminate our support for ESR 10. I see our anonymous Tenfourbird builder(s) in the land of the Rising Sun are now issuing 17 betas themselves, so it looks like they will make the jump with us. It will be interesting to see what happens to Thunderbird now that Mozilla has said its development will be coming to a close with this release. Perhaps it's time for a TenFourMonkey after all.

I'd still try to implement features of big importance, but yeah, it would fork off into space at that point. But one good thing is that we wouldn't have lots of patches to catch up on or the problems of new wine^H^H^H^Hcode in old wineskins^H^H^H^H^H^H^H^H^Hcodebases.

Kudos to the Tenfourbird builder(s) for making the jump to version 17. I think the memory footprint is probably bigger than it was for Tenfourbird 10, but it feels much more snappy to me. Search (filter) speed seems to be improved significantly. In limited use, it also seems like it might behave better with Gmail's IMAP servers. I've had minor issues with Tenfourbird 10 showing new messages in a mailbox, but not displaying (loading?) the body of the message until I switch to another mailbox and switch back. So far, I haven't run into this issue while using Tenfourbird 17. Nice!

The memory usage for both TenFourFox and Tenfourbird is indeed probably larger than for 10, mostly because the garbage collector is now incremental instead of "aggro" (so cleanup is more gradual than heavy-handed, allowing more cycles when the platform needs them), and JIT caching is smarter. But both of those translate into performance improvements, along with all the other platform changes between 10 and 17.

Don't refer to ArsTechnica's browser survey, which is based on unreliable data from Net Applications, but rather look at the real stats from StatCounter: http://gs.statcounter.com, in which Firefox has 22% worldwide usage. And in Poland Firefox has 42% usage, according to Gemius: http://www.ranking.pl/en/rankings/web-browsers-groups.html