Sunday, December 29, 2013

So over Christmas weekend I managed to get a debug build of TenFourFox 26 up and running. It basically works, though I figured it would (29, the first Australis release, is what worries me more). So that's the good news.

The bad news is that there are three major problems, one of which is worked around only incompletely, all of them having to do with the graphics stack. Recall that starting with Firefox 12, Mozilla introduced a scheme called Azure to reduce the overhead of graphics drawing by mapping graphics calls more directly into operating system primitives rather than running them through Cairo, the abstracted graphics system Firefox has used for almost everything since 3.0. We support Azure for HTML5 <canvas> elements, and in that employ it works very well, improving overall canvas performance for most shapes by about 60 percent.

Where our implementation falls flat is text and gradients; we need lots of hacks for 10.4 to make this work, and the overhead for these specific elements is significantly greater. (Text filled with gradients is even worse.) Unfortunately, a browser's primary job is to render lots and lots of text, and Mozilla now wants to use Azure to do the rendering for everything (relegating Cairo to a backup engine, and for printing). I have to disable "content Azure" in 26, or the browser renders things about three times slower overall in the debug build (and in an opt build we'd lose AltiVec-accelerated compositing through pixman too). I think we can get away with this for awhile since Cairo is still an integral part of the layout stack for now, but if Mozilla finds another printing solution then Cairo becomes expendable. I'm going to try to do more research to figure out if we can speed this up, but remember that the vast majority of supported Macs running Firefox have hardware acceleration and we don't. It's entirely possible that the combination of 10.4 graphics thunks and their hardware-tuned rendering strategy is just too much for Macs limited to software rendering like us.

On top of that, our current Azure implementation exposes bugs in 10.4 CoreGraphics that, although not obviously severe, generate invalid context errors and other such annoyances that suggest we're just wallpapering over more significant problems. I haven't figured out where these errors come from yet (I suspect DrawTargetCG::FillRect), and because of their frequency I can't enable content Azure in TenFourFox until I'm forced to.

The third problem is the most serious, and the one I don't know how to fix definitively. One of 10.6's new system features is blocks, a construct for creating closures in C, C++ and Objective-C that allows applications to better exploit Grand Central Dispatch: you can generate little snippets of code, wrap them up as a block, and pass them around in such a way that they can remember their original state when finally run and even execute in parallel. Blocks require both runtime support from the operating system to handle and execute them, and compiler support to understand the new syntax and generate the closure code. Apple implemented this in Xcode using their fork of gcc, but blocks are really a better first-class citizen in clang, and regular gcc doesn't support them.

Blocks can be added to 10.5 using PLBlocks, which offers a modified gcc (presumably using Apple's patches) and a userland framework for runtime support. Tobias himself already uses this package for Leopard WebKit, so we have good evidence it should work fine for this purpose. Although this framework does not exist for 10.4, it looks like it doesn't require any 10.5 Objective-C features to function, so it could probably be ported. That's not the real problem. The real problem is the compiler: we don't use gcc 4.0.1 or 4.2 anymore, and Apple never ported blocks to a later version (we use 4.6), so we'd have to roll this support ourselves off Apple's patches. Even assuming it works (which is a big if), I really don't want to be maintaining a compiler and an entire tool chain on top of a browser, linker and debugger, especially since it's likely we'll have to force another compiler change in the not-too-distant future (fortunately MacPorts already offers 4.8 and it works fine on 10.4). David Fang is still industriously working away on a PowerPC OS X clang, but I don't know how far along he is with it or if its codegen for blocks will work with the PLBlocks runtime, which we'll still need.

Fortunately, Mozilla uses blocks in a very limited way and only within the OS X widget library as callbacks for graphics calls, since no other compiler other than Apple's supports them. For the time being, these callbacks (which so far appear to only apply when hardware acceleration is active) can be partially emulated by spinning the closure out as a static function that can be passed as a function pointer. I say partially, because what this doesn't emulate is the, you know, closure part: being static functions they don't have access to class member properties, and even if they did (or we figure out some Rube Goldbergian bridge class) they have no memory of their value at creation, so it's possible for them to have the wrong value when the callback is triggered. I hacked around this and the app seems to be fine, but that's no guarantee it will continue to be. As Mozilla tries to optimize Firefox more for multi-core systems and Off-Main Thread Compositing looms on the horizon, the use of blocks in Mac-specific code is likely to increase because it's what Apple wants developers to do and it's relatively straightforward for developers to use, but it's going to be a big problem for us if that code is an essential part of the application.

26 will be issued as changesets only and maybe a debug build. The first unstable release will either be 29-aurora, if Australis works, or a new 24 branch if it doesn't. Cross your fingers.

Wednesday, December 25, 2013

TenFourFox 26's JavaScript is finally working again after some breaking changes Mozilla made. However, the graphics stack is now going to need a lot of work. Oh well. The good news is that 27/24.3.0 are on a slightly longer timeframe because of the holidays, so I have until February 4th to catch up.

So far no major bugs in 24 have cropped up; there are some minor ones that will be fixed in 24.3.0.

Sunday, December 15, 2013

Now that I've got some time over the holidays, the Firefox 26 port continues. Don't expect an unstable from this release (I'm not even done with JavaScript, though I'm almost done); it's just to keep things working until 29, when Australis descends like a ravenous harpy from the skies. At that point there will either be a 29-aurora or a hasty retreat.

Six months ago, Ars Technica ran a little article on Butterfly Labs' small Bitcoin miner. (For those of you unfamiliar with Bitcoin ("BTC"), this blog won't do it justice, but Wikipedia has a good overview.) The heart of "mining Bitcoin" is repeated, exponentially scaling computations to verify transactions based around the SHA-256 algorithm, for which participating computing resources receive a share of the fixed total number of Bitcoins themselves. Mining used to be done on regular general computing hardware, but this moved to GPUs as the profitability dropped, and now anyone who wants to seriously mine Bitcoin is now on dedicated ASIC hardware that simply computes SHA-256 hashes over and over as fast as possible. There are USB block erupters that generate around 300-400 MH/s (that's million hashes a second), this small BFL BitForce miner in the Ars article is a 5 GH/s miner (five billion per second), and the 28nm ASICs due in early 2014 are now in the 1-2 TH/s (trillion) range.

Bitcoin is highly speculative because there is no guarantee it will hold current value or even hold any value -- so please don't even think about BTC exchange unless you're good with obscenely high risk components in your investment portfolio. Furthermore, exchanges routinely fail, taking customer assets with them, and a large holder cashing in their hoard (Satoshi?) is likely to put the exchange price in the toilet. At the time the Ars article came out they were trading for around $130 per BTC, and I figured that a $274 investment to pick one up wasn't much money (to me) to be out if the company folded or the currency faded. I did a couple of tests on the G5 to make sure I could compile the software and it appeared to work (with CPU mining, at a pathetically low yield -- more later), so I pre-ordered one and then promptly forgot about it.

Almost six months passed. On Friday, a nondescript box arrived. It was the miner.

This particular Bitcoin miner connects over USB and appears as a serial port to the Mac. It uses standard FTDI serial drivers, which work just fine with 10.3 and 10.4 PowerPC. It draws about 35W under load, but because it requires a small amount of work from the G5 to keep it busy, I'm estimating its overall power impact at about 50W. Impressively, bfgminer says it's actually averaging around 5.6 GH/s, a nice boost over its stated capacity. The G5 is on 24/7 anyway to allow me to access files and run remote computing jobs, so it's no big deal to let the G5 run the miner also. Some people have assigned a Raspberry Pi to this task, so it's clearly not something that requires a heavy-duty computing controller.

Okay, you don't care about that. You care how much money it's making. Right now, Bitcoin trades at around $850/BTC. Bitcoin mining becomes exponentially more difficult as more BTCs enter circulation to avoid depleting the currency early, which is part of why the value increased from six months ago (simple supply and demand). At the current difficulty and exchange rate, the machine makes about $2.50/day. Given the slope of the curve, the estimated payback time is in the order of 5-6 months (this is imprecise because the difficulty is not totally predictable), but because of its very low power usage, it still remains fractionally profitable even down to making just 5 cents a day. I'm not going to be supermodels-and-Learjets rich, but it will probably make me at least a couple hundred dollars in overall profit.

However, this doesn't help you much because BFL doesn't make these things anymore, and the block erupters you buy on eBay are more than 10 times slower. A 333 MH/s USB block erupter makes about $0.15/day right now, and if one sells for $50 and the difficulty is always increasing ... well, you do the math. One argument against buying this kind of specialized hardware is that it's more profitable to make the mining devices than it is to do the actual mining. That is quite possible. ;) These little MH/s block erupters are easy to get because they don't really generate much money anymore; the heavy duty rigs take months to preorder and by then the difficulty has gone up even further.

But since we run web browsers on decade-old computers, we're clearly not here because we're excessively practical, so let's say you either want to just play around with Bitcoin or you got a block erupter or a proper ASIC miner from a generous friend. You'll need a wallet and a pool, for which the Ars Technica article has some suggestions, and then you'll need the software. Linux Power Mac users have it easiest; you can probably find a pre-built package for bfgminer or cgminer, both of which will run almost any hardware Bitcoin mining device. For 10.5 users, there is a port of cgminer that will apparently run on PowerPC with a basic interface. I'd be interested to hear if it works.

For 10.4, the situation is harder. I wasn't able to get cgminer to build at all, though I suspect I'm merely missing some of its prerequisites, and although the most current version of bfgminer can build with minor changes (3.8.0) it doesn't seem to work. Fortunately, the version of bfgminer (3.1.1) I hacked into building "back then" does work fine with the BFL miner, and as long as you have libusb installed from MacPorts or Fink it will work with most USB block erupters also. It emits a rather alarming number of hardware errors, but I suspect these are spurious because the mining pool I'm working with accepts my shares without comment and the mining pool's estimated GH/s rating matches what bfgminer is reporting. In about a week I should get my first 0.02BTC payout. So I guess that was worth my $274.

What if you just want to prospect a little with your own hardware, just for laughs? GPU mining is probably impossible on PPC OS X, though it might work on Linux (although I don't think any GPU that shipped with a Power Mac is OpenCL-capable), but you can use either software package to do CPU mining and bfgminer at least does have AltiVec support. The problem is you won't get very far even compared to a "basic ASIC" setup: a 600MHz G3 ekes out a pathetic 0.14 MH/s; a 1.67GHz 7450 G4 struggles to maintain 1.29 MH/s (source; recall that our miner is 5 GH/s == 5000 MH/s). Both are reasonably power-efficient computers, but neither will do their work in 35 or even 50 watts. Speaking of which, using the G5 for this is almost comically inefficient: my quad manages, with altivec_4way and four CPU threads, a comparatively impressive 6.7 MH/s but requires almost 300 watts of power to do it and pegs all four cores, rendering the computer essentially useless for any other purpose.

So, uh, keep clicking on the Google ads on this blog if you want to financially support this project. ;)

So we'll try. 26 is still in pieces and there will probably not be an unstable release, just a set of changesets for the perverse interested. However, I am pretty certain 26 will work. After that, my thinking is to jump to either 28 or 29, since if Australis doesn't gel, there's no point in expending additional effort. I'll have some time over the holidays to deal with this since I'll be between terms in my Master's degree program.

Meanwhile, 24.2.0 is out. This includes the performance tweaks I discussed (issues 253 and 254); release notes and downloads available at the usual places. This will be the first official release of 24 to the general public, so let's make it a good one. Assuming no obvious issues, 24.2.0 becomes live Monday night Pacific time as usual and we will wave good-bye to TenFourFox 17. You'll also get to see our new Mavericks-inspired Apple swipe, as is my custom. :)

The newest technical concern is Mozilla's renewed push towards Electrolysis, or multi-process Firefox. Most of you are familiar with Chrome's use of a process per (sometimes several) tabs, although Internet Explorer 8 actually pioneered this approach some months before. The issue Electrolysis specifically addresses is unique to Firefox: most of the browser is written in JavaScript, but because all JavaScript runs on the main thread which also services user requests (including from web pages, because accesses to the document are not thread-safe), when a page's script stalls, it also stalls any browser code that is waiting to run, and the entire application must wait for the script to run or be cancelled. Using a JIT to compile JavaScript only reduces this latency; it does not eliminate it. Compared to this problem, improving browser stability and security is actually a remarkably secondary issue, though separate processes also achieve that too.

Running webpage scripts in a separate execution context is not a new idea; Mozilla played with an idea called "supersnappy" which would have done this in single-process Firefox, but the plumbing required was very complex and delicate. Electrolysis itself has been around in various forms as far back as 2009, and modern Firefox Android and Firefox OS use it, but the desktop browser does not due to lots of things that don't work. Even now, the browser is only up to the point where basic browsing works; many add-ons won't work at all. (Don't try to enable this on TenFourFox, by the way. If it starts at all, it will likely crash very soon afterwards. I'll explain why in a moment.)

The first iteration will just have a single content process within which all the tabs run, which diminishes the memory overhead while still getting most of the stability benefits and some of the performance benefits. However, a lot of work must be done on add-on compatibility. Most Jetpack-based addons (like the QTE and the MTE) will work fine with Electrolysis Firefox, but older or more low-level addons (like OverbiteFF) may need modification. There are also lots of potential race conditions because you now have two queues of JavaScript which may deal with items in non-deterministic order. Since the browser interface is JavaScript too, they really need to get this right.

The amount of time required to get this plumbing in and fix all the outstanding bugs indicates that this will become mandatory at least after Firefox 31. That's good, because as written, it won't work with 10.4; we don't have the functions needed to spawn processes the way IPC Chromium expects (which handles multiprocess Firefox), and they are stubbed out with debugging messages in the current version of TenFourFox since it doesn't need them. The underlying work to fix this is issue 66. Even then we may have IPC bugs in 10.4 getting the processes to talk to each other, although I believe Electrolysis is achievable on Tiger overall and will be beneficial to us too ultimately. Fortunately, we don't have to worry about it yet.

Tuesday, December 3, 2013

I got my Williams Star Trek: The Next Generation pinball in, complete with colour dot matrix display, to complement my beloved Stern Sopranos pin. Naturally, I had to thoroughly test it during the 90 day warranty period, because, you know, it's a rough job to play a lot of pinball but someone's gotta do it. The TNG game, for those of you unfamiliar with it, is a pinball classic that is constantly at or near the top in table rankings; it's one of Williams' most inspired offerings from the great Steve Ritchie.

Set for free play, once you've drained your three balls (and this game will gobble them), Counselor Troi will ask, "I'm sensing you want to continue" (to "buy" a ball). And really that's the dilemma we're at, so you tell me.

First some good news for a change. The issue I had with Firefox 26 also cropped up in 24 and Mozilla determined it was a bug and fixed it, which enabled me to get Fx26 compiling further (it still doesn't get through JavaScript, but this is due to code I just have to sit down and write, not a fundamental compiler issue). I've also completed my audit of the Australis widget changes and while some of them are going to be a headache, I'm actually a bit more confident it can be ported to 10.4. I'm still not fully convinced, but some of the 10.5-only code I was worried about was replaced with more portable implementations that also will simply work better.

Still, let's review how far we've come. We are still on a supported branch of Firefox, albeit an ESR branch (ESR24), over three years after Firefox 4 beta 7 ended PowerPC support; we have successfully ported through twenty revisions of Firefox since then. This branch of Firefox will officially receive security updates until at least early 2015, at which point we can still maintain it with backports. We implement in Fx24 most of HTML5 and, arguably, more of CSS3 than anyone else. We also already implement most of ECMAScript 6, the next standard for JavaScript (TC39/"Harmony"), and again, more completely than most other browsers. Maintenance of these three prongs (HTML5, CSS3 and TC39) is necessary to keep us current with web standards, but we are in an excellent position with regard to compliance right now.

Given all that, is it in our best interest to continue to try to get to Firefox 31, especially with all that's coming down the pike (Australis being the most notorious)? Well, that's the part I'm not sure about. I don't want to prematurely abandon keeping current with somewhat-bleeding-edge Firefox, but my experience with doing backpatches on Firefox 22 at the same time I was trying to fix Firefox 24 and maintain Firefox 17 was not a lot of fun, and I don't want us marooned on a version of Firefox stuck between ESRs with no updates at all. But I lay out the pros and cons:

Pros of staying on 24

No Australis (also a con).

Officially supported security and stability updates, at least until early 2015.

More time to focus on improving the port instead of desperately attempting to keep up with the Mozilla version treadmill because merging changes will be a lot less work.

The IonMonkey JavaScript compiler port will be a lot easier to do. (Remember, we're still on PPCBC, which has much lower latency but generates much less efficient code.) In fact, much of it is written already, so it's just a matter of getting it working.

We can start customizing it more. One perennial request is a proper AppleScript implementation. Another is a built-in user-agent switch. Built-in adblock might also be a thought. And lots of people still want H.264 support.

We can still implement many of those missing TC39/CSS3/HTML5 features. These features are written in high-level code in later versions of Firefox that are not far removed from ESR24, so they can be backported relatively easily.

Pros of trying to get to 31

Australis (also a con). We can't get to 31 without it.

If we get to 31, then we get security patches well into 2016, and could even look to continuing the port past that point.

Although IonMonkey will be harder work to get running, it is likely to be quite a bit faster due to JavaScript improvements along the way which 24 does not have.

Better add-on and theme compatibility.

Bugs in the original TC39/CSS3/HTML5 features we are trying to add to 24 will already be fixed without us trying to track them down.

Cons of staying on 24

No Australis (also a pro, depending on your perspective).

When security support ends from Mozilla, we must provide our own.

Themes and add-ons from versions of Firefox after 25 or 26 or so may have incompatibilities.

We need to find and fix ourselves bugs in the original TC39/CSS3/HTML5 features we are trying to add to 24.

It is highly unlikely further-out-future standards like CSS4 will be easily portable to 24.

Cons of trying to get to 31

Australis (also a pro, depending on your perspective). We can't get to 31 without it.

If we don't get to 31, we're stuck rolling our own security patches on a totally unsupported branch (i.e., whichever was the last version that worked).

Several new technologies are landing, including off-main-thread compositing (even for software rendering, which is our standard operating mode) and generational garbage collection. These might not be portable or even compileable, but they will almost certainly be mandatory.

A large libvpx change is scheduled for Fx28 which might break our AltiVec WebM acceleration.

Graphics and widget changes are very likely to accumulate more interface glitches, some fixable, some less so.

When 10.6 support goes, all supported Macs will be 64-bit in addition to requiring all the interface changes of 10.7+. If Mozilla completely deprecated 32-bit support on OS X, the only Power Macs that could support a true 64-bit build even theoretically are the G5 Macs, but there is no 64-bit Carbon (and we depend heavily on Carbon). Although 10.6 usage is still high, it is virtually certain to have been abandoned by Apple now that 10.9 is out, and Mozilla would most likely drop 10.6 support just prior to ESR31 based on their previous behaviour with 10.4 and 10.5.

Much less available time to work on 24, which would still receive builds.

I'd like to hear your thoughts about this so I can make a decision. I might jump from 26 all the way to 29 to see if Australis will actually build (if it does not, this is a showstopper). But I'm uneasy about 10.6 support being removed as well in the (likely) very near future, and certain elements at Mozilla have been very aggressive about removing legacy code we might have been using. Either way, even if we stick with 24 as the basis for our new "unstable branch," we have to consider the consequences carefully.

Still waiting for build tags on 24.2.0 and then we'll build it and get it out to you; I'm hoping for Friday. And a big thanks to Chris T and the localizers for their hard work on our new 24-specific langpacks (see issues 42and 61). 24 should be a good launch no matter which way we decide we want to end up.