Sunday, December 21, 2014

Everyone needs a break, so I took a break from my Master's project to experiment with a few ideas I've been ruminating over (it's also a convenient excuse to goof off). We're always looking for more performance from our older machines and sometimes this requires rewriting or changing assumptions of the code where this can be done without a lot of work.

Multiprocessing/multithreading has been one of those areas. Since around Fx22 Mozilla has been using background finalization for JavaScript (so background finalization was in 24 and 31 but not in 17) which is to say that the procedure for deallocation of objects is not done on the main thread; finalization is a separate, though related, process to garbage collection where finalized objects with no remaining references are reclaimed. This, at least on first blush, would seem like an unmitigated win, especially on machines with multiple CPUs -- you just do the work on another core while you're working on something else. But interestingly it has some very odd interactions on my test systems after long uptimes. Some profiling I did when TenFourFox just seems to be sitting there waiting showed that it wasn't TenFourFox (per se) that was twiddling its thumbs; the OS X kernel was stuck in a wait state as well and it seemed to have been kicked off by ... background finalization, waiting on threading management in the kernel, which had temporarily deadlocked. This may be the source of some unexplained seizing up that a few people have complained about and we had no obvious way to reproduce.

We can't do anything about the kernel (this is another reason why I'm very concerned about how Electrolysis will perform on 10.4/10.5), but background finalization can be defeated with a few lines of easily ported code. This doesn't make the browser faster objectively, mind you; it just "spreads the badness around" so that finalization occurs predictably, and in a manner that makes it more likely to complete, which in turn makes garbage collection more likely to complete (which since the fix in 31.3 now can run on a different core and doesn't appear to trigger this specific issue), which in turn reduces the drag on performance. In effect, this rolls this aspect of JavaScript back to Fx17.

The result is more consistent and smoother, even if it's not truly faster -- especially on the G5 where bouncing around in code generally hobbles performance -- and does not seem to affect uniprocessor systems adversely, but I'd like to get a test out where you can play with it and see. You should not expect much difference immediately when you start; in fact, startup time might even be slower, and memory usage will show little if any improvement. This is just to improve the browser's responsiveness in terms of staying reasonably quick after multiple compartments have been allocated and need to be scanned.

Speaking of, I also implemented the reduction on tab undoes (to 4) and window undoes (to 2), which also reduces the number of compartments that must be scanned (you can override this from about:config, but be aware that every tab and tab-undo-state you keep in memory remains active and so the garbage collector must evaluate it), and I threw in one other change that forces substantial additional buffering of video playback just to see how this works out for you lot. Like the change in finalization, this "spreads the badness" by forcing the browser to build up a large backing store of fully decoded video frames before playback (up to 10 seconds' worth depending on available memory). Videos may appear to stall initially, but then can play more smoothly because decoding is now more aggressively buffered as well, not just downloaded video data. This quad shows improvement only in that playback is more consistent, but on my 1GHz iMac G4 many standard-definition videos on YouTube are now noticeably less like a slideshow with audio. It's possible to overdo this setting, so I've settled on a conservative number that seems to work decently for the test machines here (it's baked into the C++ code, sorry -- you can't twiddle this from about:config). The minimum recommendation for video is still a 1.25GHz G4.

The tab undo and buffering changes will be carried into 38, but 38 introduces generational garbage collection (to us) and I have to do some testing to determine if it will react adversely with GGC's system assumptions. Please note that I only built this for G5 and 7450 mostly because I want testing on a good mix of single and multi-CPU systems (7400 users can use the 7450 build if you really want to, but sorry, G3 folks, you'll have to wait for the next scheduled release), but I did the building on my new external solid state drive which reduces build overhead by as much as 30%. Not bad! I'll post some stuff about the RAM disk and SSD build testing I've been playing with in a future entry.

Saturday, December 20, 2014

UPDATE: After digging through codepaths in ntpd, I'm concluding that an outbound connection -- i.e., you connecting to an external time source -- is probably not vulnerable, at least not to the most serious of the flaws below (ctl_putdata() and configure()). By default OS X does not allow inbound connections, which definitely are vulnerable. If you have changed this configuration, however, you should still update, and the sixth flaw in receive() may still apply, although the exploitability of this particular flaw is believed to be highly unlikely. The tools are still available if you want them.

The sec-buzz this time is a package of six, count 'em, six vulnerabilities in OS X's Network Time Protocol daemon, or ntpd (and possibly its associated tools, see below). NTP is one of the oldest Internet protocols still in use, starting with NTPv1 RFCed all the way back in 1985 -- in fact, the very same David L. Mills who wrote the original NTP RFC 958 is still the one maintaining it today, almost twenty years later. NTP is an extremely accurate and somewhat complex means of time synchronization that can keep Internet-connected clock devices and computers tightly coordinated with highly accurate timesources. In theory, NTP can capture time differences as small as 233 picoseconds, or 2-32 seconds; in typical network environments, it can keep accurate time within a few tens of milliseconds (currently the internal NTP server at Floodgap has an average dispersion of about 15ms), and as low as a single millisecond or less on a local network (my NetBSD/cobalt server is showing 0.2ms).

Accurate time is very important, especially for security, and virtually every modern platform offers some manner of time synchronization; almost every Un*x or Un*x-like operating system uses ntpd, the reference implementation (don't start that OpenNTPD crap here, please), including Mac OS X. And no surprises, the version included on every release of OS X is vulnerable up to and including Yosemite which only runs ntp 4.2.6.

The immediate response is, "but wait! wait! I don't run a time server!" No, you don't (probably), but if you have "Set date & time automatically" checked in Date & Time in System Preferences, you're running ntpd. (If you did the change I recommended in an earlier post to switch to intermittent ntpdate in cron, you're not running ntpdbut you are still vulnerable to one of the flaws; read on.) If the remote server sends you a specifically crafted malicious packet, your computer could be commandeered to run a program under the control of the attacking server.

In practice, this attack is going to be limited for most of us for several reasons:

By default the Apple implementation doesn't use cryptographically authenticated time packets (flaws one, two and three).

If you are connecting to a well-maintained, trusted time server (time.apple.com would be an obvious one), the chances of it going bad or packets from it being intercepted and subverted are low to very low in most circumstances. (The risk of interception and subversion is higher if you don't trust the nameserver or the network, such as unencrypted Wi-Fi in a hostile environment, where you could be directed to a malicious server or someone could try to inject packets as fast as possible to arrive before an authoritative response.)

If you're on a Power Mac, unless the attacker knows this and sends a PowerPC-specific sequence, the circulating attacks (to flaws four, five and six) will very probably be designed for x86 and this lowers your effective risk to near zero. At worst ntpd will just crash. (If you're on an Intel Mac, though, this doesn't help you; you're the type of computer an automated scanner would target.)

If you don't even try to synchronize your clock, which is apparently the case for more systems than I would have thought, you aren't vulnerable at all. But this isn't a good idea.

If you are running ntpdate from cron only, and not ntpd, ntpdate is potentially vulnerable to flaw six but none of the others (again, assuming authentication is not enabled). No one is even sure if it's exploitable in its current form, and all of the mitigations above (except the last) still apply, so your risk is even lower in that case.I am in error. ntpdate is not vulnerable to this flaw.

In my situation, I have an internal NTP server which talks to a selection of publicly available stratum-1 and stratum-2 timesources. Since it's facing out on the wild wild Internet, I went ahead and upgraded that to ntp 4.2.8, which has these flaws repaired. All of my internal systems only talk to this internal NTP server which is now protected from the problem, and the only way someone's going to subvert that is by either splicing the wires or installing malicious software on the inside -- and if that happens I've got bigger problems than bad clocks. Just in case, however, I went ahead and built and updated ntpdate on my 10.4 systems so that there's no chance.

After careful consideration, I am hesitant to indiscriminately advise everyone to update ntp on your system -- I've learned from offering a rebuild of bash just how many of you can wreck your computers from Terminal :P -- unless you must connect to a risky network from time to time. Even in that case, however, it might be better just to disable "Set date & time automatically" when you're on those types of connections instead and resynch your clock when you're back on something reasonably trustworthy.

If you really can't avoid that, or you're incredibly paranoid, then here is a build of the relevant utilities for 10.4 PowerPC (it will work on 10.5 also). I had to make a change to ntp_io.c to make it build with Xcode 2.5, which I included if you want to roll your own. It only includes the core ntp tools that have relevant vulnerabilities or dependencies (ntpd ntpdate ntpdc ntpq) and none of the SNTP stuff, which is a pretty sucky way to maintain time anyhow. You will notice there is no x86 or universal build, because the current version of the source code doesn't make it easy to build fat binaries, nor instructions, because I want you to think very carefully about why you're installing this. If you're on an x86 system with a vulnerable timekeeper and no available update for this issue (10.6, say), and you're recurrently or even constantly on a hostile network, then you've got bigger problems than this one -- this is a very small bandaid on a potentially deep wound. For that matter, though, even if you're on a Power Mac, please consider what you're trying to accomplish before you install these replacements. They can replace the current ones on the system directly, or if you're running ntpdate from cron and don't have ntpd running, then just put the new ntpdate somewhere convenient like /usr/local/bin and change your crontab to run it from there.

Wednesday, December 10, 2014

I am a huge fan of the classic Palm OS. Yes, webOS was pretty cool, while it lasted (there's a Pre 2 on my shelf courtesy of the funkatronic Ed Finkler that I barely got to explore before HP killed the Palm line), but the original Palm OS was not unlike what you'd get if you miniaturized the classic Mac OS into a handheld form factor. Under the hood, you had very similar system designs and implementations, and Macs were well supported as development tools. Heck, they were even 68K systems, like the original Macs, and the switch to the ARM-based Palm devices was eerily similar to the Power Macintosh transition. The ARM Palm OS even had its own 68K emulator -- in fact, the normal state of the system was to be running 68K code!

Myself I started with a Palm m505 which I bought in medical school (new shortly after its introduction, a substantial expense to a starving student) for calculations and drug databases, at which it excelled. I also rapidly learned how to program it, using a remarkable port of the Lua programming language called Plua, and when I upgraded to a ARM-based Zire 72 everything easily transferred over to the new unit which is a testament to how well the sync and OS were designed. Both of these units still work, by the way, even though the m505's battery is shot and won't hold a charge anymore. Even after I got an original iPhone in 2007 I still used the Zire 72 for a number of years because it had better pharmacy apps and it could record video (the iPhone couldn't do this until the 3GS).

Parenthetically, however, if I were forced to pick the best Palm OS device ever made I would actually make a non-Palm selection: the AlphaSmart Dana wireless. If you'd ever wondered what a PalmOS laptop would wind up like, that's pretty much what the Dana is, with a wide 560x160 backlit greyscale LCD, a full and luxurious keyboard, two, count 'em, two SD/SDIO slots, up to 16MB of memory, built-in WEP 802.11b, an integrated USB cradle and even a USB port for a printer. It runs on rechargeable batteries (easily replaceable) or even AAs; mine has a nearly new rechargeable battery pack, a 1GB SD card (maximum supported) in one slot and a Palm Bluetooth SDIO card in the other. The keyboard is incredibly good. There are writers who cling to these things even now because they run for ages on a single charge, the built-in editor is quick and distraction-free, and when you've got your stuff typed, you simply plug it into your Mac or PC and "squirt" it into your word processor of choice: it emulates a USB keyboard! To this day my mother uses one for her notes at church because it runs longer than many laptops, it's light and durable, and when she comes home she can merely plug it into her Mac mini and dump everything into Microsoft Word. Its chief flaw -- and this is a big one -- is that the 33MHz Dragonball VZ CPU is an absolute slug. While it's more than adequate for its avowed purpose and very thrifty with power, heavy duty apps just drag on it even if you install an overclocker.

The biggest, baddest Palm of them all is of course the T|X ("TX"), with a 312MHz Intel PXA270 ARM CPU, 32MB of RAM, 128MB of flash memory (remember that earlier Palms kept everything in RAM, so battery failure was catastrophic), built-in Bluetooth and WiFi and a beautiful 320x480 (320x448 effective resolution) colour LCD screen. Although the earlier Tungsten T5 had 256MB of Flash and a 416MHz CPU, it was cursed by God with system bugs that updates did not completely fix, bad memory management, sluggish Bluetooth and no Wi-Fi. The TX can easily be overclocked to 520MHz (don't exceed this, though, or you'll have a nice doorstop for a day or two while the battery discharges!), it can take additional SD card space (there's even a third-party SDHC driver), and it has more advanced networking, an updated OS and an updated version of the Blazer web browser. By the way, hats off to Dmitry Grinberg for making both of those tools free. That's classy.

But I still like my Zire 72 (in the 72s form factor with the more durable silver paint job) better than my T|X for four reasons: first, the camera, which is pretty dire by today's standards but is still very handy; second, the extra side button (more on that in a moment); third, mini-USB is much easier to work with than that stupid Athena connector on the T5 and TX (though you can charge over USB with the Athena); and last but not least, the replaceable battery. The T|X battery is soldered in -- if you need to hard-power-cycle it, you'll either have to wait it out or cut the leads, and you'll need to get out your soldering iron to put in a new one. The Zire's battery can be (carefully) removed and replaced, however, greatly expanding its longevity. While it won't surf the web anywhere near as well as the T|X, no one's realistically using their Palm for that anymore, you can put nice big SD cards in it as well (and overclock it too because the Zire 72 uses exactly the same CPU as the TX, and the same 32MB of RAM, though no flash memory), and it has built-in Bluetooth also. If you really need WiFi, the palmOne 802.11b SDIO card will work, and the 320x320 screen is pretty decent. Right now, my Zire 72, Dana wireless and TX are all out in the commons area with their own charging stations and they each have their own little tasks they specialize in.

Which brings me to the point of this blog post. I am also a big fan of the Philips hue light system, which uses controllable colour LED bulbs for automated illumination. You can run it from a smartphone, but I centrally control mine on the secured internal network using huepl, a command-line tool I wrote for this purpose. (It works on 10.4, of course.) I have two hue base stations set up, one in my office, and one shown here out in the commons.

Last time Mom was over, Dad and I were working outside early while she slept in my guest room. She got up and found out she couldn't turn the lights on because she didn't have a device to talk to the central server. Never leave your mother in the dark. Ever. So for their next visit I decided I need to make a guest light controller.

While Philips makes a wall switch (interestingly using the kinetic energy of pressing its buttons to send a signal to the hue base station), a custom wireless solution would be more optimal and more flexible. However, I don't allow Wi-Fi on my secured internal network because I don't want someone sneaking into it one day -- it's a secured network for a reason to protect all these lovely old machines, not to mention my financial records. There's a shorter range way of getting into a network though:

Yep, that's a Bluetooth access point. Although theoretically another Class 1 device could try to talk to it from outside the house (if they figured out the pairing code), in practice it's going to be limited by most transceivers being substantially lower power. But more importantly, this specific access point, the Belkin F8T030, is particularly useful in this situation because with its default firmware it only offers the older Bluetooth LAN Access Profile, not the Personal Area Network (Bluetooth PAN) profile currently supported by 10.4+ and most current mobile devices. My Android phones and Macintoshes can only see the Belkin if they're right on top of it, and even then, even with the correct pairing code, they can't do anything with it or connect to the internal network through it. Only the Palm OS devices can -- because they speak Bluetooth LAP, not PAN.

You can flash the F8T030 to speak PAN as well, but I actually don't want that, in this application. By the way, the F8T030 runs uCLinux -- you can even telnet into it and get a shell prompt!

So that solves the connectivity end; now I have to get the Palm to talk to the lights. In this case I selected the Zire 72 as the light controller because I have a couple of them as spares lying around and as mentioned I can repair them to some extent. First thing was to configure it and pair it with the Bluetooth access point, which was pretty simple in PalmOS 5. However, rather than teaching the Palm how to speak the hue base station REST API (because I'd have to give it keys and keep those current), I decided to create a couple scripts on the internal interface of the web server and use EudoraWeb to control the lights through those scripts, like so:

This worked, but you can rapidly see some disadvantages. First, there's not a lot of room left to add more options without getting cluttered (admittedly my choice of the Zire 72 made this problem more acute). Second, running it through EudoraWeb means I don't have any real control over the interface. If the Bluetooth connection got wacky (say Mom didn't have the access point in a direct line of sight, or she pointed the Zire 72 in another direction), EudoraWeb would start throwing generic error messages or offering to switch to offline mode, which would likely drive her crazy and leave the Palm in an "indeterminate" state. And she'd still be in the dark.

So that brings us back to Plua, because I could at least put better error handling in. Plua has very simple built-in primitives for things like streams and TCP/IP sockets, so I moved the scripts to the internal gopher server to make the protocol simple and threw together a proof of concept. This first draft had some big buttons you could push like a remote. This seemed like a good idea and worked well ... except that if the Bluetooth connection failed or you turned the Palm off while the light app was running, Plua went haywire and didn't reconnect. (Obviously a bug, but I don't have any way of fixing that in the Plua runtime.) If you restarted the app, it worked and reestablished the Bluetooth link, but that was inconvenient.

Then it dawned on me: since the connection would be reestablished when the app restarted, move the selection interface into the Palm launcher with multiple separate apps for each light profile that just called the appropriate script on the internal gopher server and quit. Plus, I could have as many profiles as fit on the screen and generate them off a template. On the iMac G4 I wrote a script generator and cross-compiled four basic profiles (off, all on, all dim, corner only) in MacPlua and pushed them over to the Zire 72.

Now, whenever you start any of them (large enough to be pressed with a finger), the Bluetooth link is reestablished if it lapsed, and the correct script on the gopher server is invoked:

But wait: we can make it even easier for Mom. Remember that single side button for voice memos that the Zire 72 has? You can change it to anything, so I changed it to ... turn the lights on by calling that light profile app. Mom picks up the Palm, sees the big label on it saying "to turn lights on, press side button," sees the only side button the device has, presses it, the Zire turns on, the app starts, connects, and the lights come on. You can even find it in a dark room because of the charging light on the end table. See it in the photograph above?

It's like the best Mother's Day present I've ever gotten her. Totally. No doubt.

Here's the source code for the template so you can see how cool and easy Plua is. PalmOS in the house yo.

Monday, December 8, 2014

During the continuing Bataan death march rewrite of IonMonkey (done with the assembler and about 60% done with the macroassembler) I discovered a glitch in PPCBC with certain comparison operations that the JIT tests apparently don't cover. I don't believe this has any material effect, but we'll fix it anyway, along with a bug in hyphenation that appears to be another Mozilla goof where they assume every architecture represents boolean variables as bytes (on 32-bit PowerPC the native type is 32-bit word, though the compiler can be configured otherwise). That one may affect Linux too, so we'll push it upstream.

Also, during the 29 timeframe I had planned to reduce browser.sessionstore.max_{windows,tabs}_undo as part of a more aggressive tuning of browser memory usage; I don't know why this never got into the code for 31ESR, but it didn't even though the rest of the garbage collection changes did. My original plan was to set the maximum tab and window undo level to 2 each, but a few of you thought two-tab undo was a bit too low, so the current plan is 4 tab undo and 2 window undo (the default is 10 and 3). This means the browser only keeps four old tabs and two old windows in memory, so old objects get released more often. You can, of course, change the old settings back in about:config, but advise if even these default seem unreasonable to you in the comments.

In other news, the two-month-old POODLE SSL/TLS attack just gets worse and worse. Yeah, you heard me: the POODLE exploit can now affect TLSv1 as well, not just SSLv3. POODLE is a padding oracle attack that takes advantage of a gap in the SSLv3 specification where an attacker can arbitrarily manipulate the unvalidated padding bytes in an SSL record. By causing a user's browser (through JavaScript, typically) to issue multiple carefully crafted SSL requests using a credential or login cookie, a malicious intermediary can slowly (with probability 1/256 for each request revealing one byte of a secret to be decrypted) duplicate that credential or cookie and steal it. Today's attack demonstates the same attack is feasible against as many as 10% of the Internet's servers, even those that implement TLS v1.2 (the current spec) as a minimum, due to a similar glitch where the server software doesn't validate those same padding bytes even though they're supposed to. Affected sites even include some major banks and government agencies!

This sort of activity is likely to push up the timetable for deprecating TLS prior to 1.2 or with non-AEAD ciphers due to the increasing evidence that earlier TLS versions and ciphersuites have multiple potentially exploitable cryptographic weaknesses. Firefox (and of course TenFourFox) has supported TLSv1.2 with a full built-in AEAD ciphersuite since 27, and as of this release (31.3) SSLv3 is disabled as well, so TenFourFox is fully current with modern best encryption practise. The same cannot be said for other browsers on PowerPC OS X which depend on the built-in operating system SSL support (this includes Safari and iCab even if Leopard WebKit is installed); the built-in SSL/TLS support libraries in 10.4 and 10.5 are both vulnerable to BEAST and POODLE-type attacks and do not implement TLSv1.1 or TLSv1.2. In the near future browsers and other applications depending on these libraries will no longer be considered safe for secure data transmission, meaning apps will need to package their own encryption libraries to stay current (which we already do, using Mozilla NSS). That's also bad news for Classilla: although attacks like POODLE are highly impractical against Classilla users, that's mostly a factor of the slower JavaScript and network code and not any strength in its crypto suite. While the improved TLSv1.0 support in 9.3.3 has been mostly glitch-free (one site remains under investigation), TLSv1.1 and TLSv1.2 will require some substantial refactoring, and Classilla has no AEAD ciphers at all.