Sunday, June 3, 2018

Another weekend on the new computer (or, making the Talos II into the world's biggest Power Mac)

Your eyes do not deceive you -- this is QEMU running Tiger with full virtualization on the Talos II. For proof, look at my QEMU command line in the Terminal window. I've just turned my POWER9 into a G4.

Recall last entry that there was a problem using virtualization to run Power Mac operating systems on the T2 because the necessary KVM module, KVM-PR, doesn't load on bare-metal POWER9 systems (the T2 is PowerNV, so it's bare-metal). That means you'd have to run your Mac operating systems under pure emulation, which eked out something equivalent to a 1GHz G4 in System Profiler but was still a drag. With some minimal tweaks to KVM-PR, I was able to coax Tiger to start up under virtualization, increasing the apparent CPU speed to over 2GHz. Hardly a Quad G5, but that's the fastest Power Mac G4 you'll ever see with the fastest front-side bus on a G4 you'll ever see. Ever. Maximum effort.

The issue on POWER9 is actually a little more complex than I described it (thanks to Paul Mackerras at IBM OzLabs for pointing me in the right direction), so let me give you a little background first. To turn a virtual address into an actual real address, PowerPC and POWER processors prior to POWER9 exclusively used a hash table of page table entries (PTEs or HPTEs, depending on who's writing) to find the correct location in memory. The process in a simplified fashion is thus: given a virtual address, the processor translates it into a key for that block of memory using the segment lookaside buffer (SLB), and then hashes that key and part of the address to narrow it down to two page table entry groups (PTEGs), each containing eight PTEs. The processor then checks those 16 entries for a match. If it's there, it continues, or else it sends a page fault to the operating system to map the memory.

The first problem is that the format of HPTEs changed slightly in POWER9, so this needs to be accommodated if the host CPU does lookups of its own (it does in KVM-HV, but this was already converted for the new POWER9 and thus works already).

The bigger problem, though, is that hash tables can be complex to manage and in the worst case could require a lot of searches to map a page. POWER8 and earlier reduce this cost with the translation lookaside buffer (TLB), used to cache a PTE once it's found. However, the POWER9 has another option called the radix MMU. In this scheme (read the patent if you're bored), the SLB entry for that block of memory now has a radix page table pointer, or RPTP. The RPTP in turn points to a chain of hierarchical translation tables ("radix tree") that through a series of cascading lookups build the real address for that page of RAM. This is fast and flexible, and particularly well-suited to discontinuous tracts of addressing space. However, as an implementational detail, a guest operating system running in user mode (i.e., KVM-PR) on a radix host has limitations on so-called quadrant 3 (memory in the 0xc... range). This isn't a problem for a VM that can execute supervisor instructions (i.e., KVM-HV) because it can just remap as necessary, but KVM-HV can't emulate a G3 or G4 on a POWER9; only KVM-PR can do that.

Fortunately, the POWER9 still can support the HPT and turn the radix MMU off by booting the kernel with disable_radix. That gets around the second problem. As it turns out, the first problem actually isn't a problem for booting OS X on KVM once radix mode is off, assuming you hack the KVM-PR kernel module to handle a couple extra interrupt types and remove the lockout on POWER9. And here we are.(*)

Anyway, you lot will be wanting the Geekbench numbers, won't you? Such a competitive bunch, always demanding to know the score. Let's set two baselines. First, my trusty backup workstation, the 1GHz iMac G4: It's not very fast and it has no L3 cache, which makes it worse, but the arm is great, the form-factor has never been equaled, I love the screen and it fits very well on a desk. That gets a fairly weak 580 Geekbench (Geekbench 2.2 on 10.4, integer 693, floating point 581, memory 500, stream 347). For the second baseline, I'll use my trusty Quad G5, but I left it in Reduced power mode since that's how I normally run it. In Reduced, it gets a decent 1700 Geekbench (1907/2040/1002/1190).

First up, Geekbench with pure emulation (using the TCG JIT):

... aah, forget it. I wasn't going to wait all night for that. How about hacked KVM-PR?

Well, damn, son: 1733 (1849/2343/976/536). That's into the G5's range, at least with math performance, and the G5 did it with four threads while this poor thing only has one (QEMU's Power Mac emulation does not yet support SMP, even with KVM). Again, do remember that the G5 was intentionally being run gimped here: if it were going full blast, it would have blown the World's Baddest Power Mac G4 out of the water. But still, this is a decent showing for the T2 in "Mac mode" given all the other overhead that's going on, and the T2 is doing that while running Firefox with a buttload of tabs and lots of Terminal sessions and I think I was playing a movie or something in the background. I will note for the record that some of the numbers seem a bit suspect; although there may well be a performance delta between image compression and decompression, it shouldn't be this different and it would more likely be in the other direction. Likewise, the poor showing for the standard library memory work might be syscall overhead, which is plausible, but that doesn't explain why a copy is faster than a simple write. Regardless, that's heaps better than the emulated CPU which wouldn't have finished even by the time I went to dinner.

The other nice thing is that KVM-PR-Hacky-McHackface doesn't require any changes to QEMU to work, though the hack is pretty hacky. It is not sufficient to boot Mac OS 9; that causes the kernel module to err out with a failure in memory mapped I/O, which is probably because it actually does need the first problem to be fixed, and similarly I would expect Linux and NetBSD won't be happy either for the same reason (let alone nesting KVM-PR within them, which is allowed and even supported). Also, I/O performance in QEMU regardless of KVM is dismal. Even with my hacked KVM-PR, a raw disk image and rebuilding a stripped down QEMU with -O3 -mcpu=power9, disk and network throughput are quite slow and it's even worse if there are lots of graphics updates occurring simultaneously, such as installing Mac OS X with the on-screen Aqua progress bar. Minimizing such windows helps, but only when you're able to do so, of course. More ominously I'll get occasional soft lockouts in the kernel (though everything keeps running), usually if it's doing heavy disk access, and it acts very strangely with stuff that messes with the hardware such as system updates. For that reason I let Software Update run in emulated mode so that if a bug occurred during the installation, it wouldn't completely hose everything and make the disk image unbootable (which did, in fact, happen the first time I tried to upgrade to 10.4.11). Another unrelated annoyance is that QEMU's emulated video card doesn't offer 16:9 resolutions, which is inconvenient on this 1920x1080 display. I could probably hack that in later.

QEMU also has its own bugs, of course; support for running OS 9/OS X is very much still a work in progress. For example, you'll notice there are no screenshots of the T2 running TenFourFox. That's because it can't. I installed the G3 version and tried running it in QEMU+KVM-PR, and TenFourFox crashed with an illegal instruction fault. So I tried starting it in safe mode on the assumption the JIT was making it unsteady, which seemed to work when it gave me the safe mode window, but then when I tried to start the full browser still crashed with an illegal instruction fault (in a different place). At that point I assumed it was a bug in KVM-PR and tried starting it in pure emulation. This time, TenFourFox crashed the entire emulator (which exited with an illegal instruction fault). I think we can safely conclude that this is a bug in QEMU. I haven't even tried running Classic on it yet; I'm almost afraid to.

Still, this means my T2 is a lot further along at being able to run my Power Mac software. It also means I need to go through and reprogram all my AutoKey remappings to not remap the Command-key combinations when I'm actually in QEMU. That's a pain, but worth it. If enough people are interested in playing with this, I'll go post the diff in a gist on new Microsoft Visual GitHub, but remember it will rock your socks, taint your kernel, (possibly) crash your computer and (definitely) slap yo mama. You'll also need to apply it as a patch to the source code for your current kernel, whatever it is, as I will not post binaries to make you do it your own bad and irresponsible self, but you won't have to wait long as the T2 will build the Linux kernel from scratch and all its relevant modules in about 20 minutes at -j24. Now we're playing with POWER!

What else did I learn this weekend?

If you want disable_radix to stick on bootup, just put it in a grub config and Petitboot will pick it up. This is particularly helpful because I still can't figure out what I should be putting in BOOTKERNFW to enable the Radeon card, so Petitboot still comes up with a blank screen unless I pull the VGA disable jumper.

amdgpu is still glitchy sometimes. Trying to view very large images in Eye of GNOME actually caused graphics corruption and garbled the mouse pointer. The session quickly became unusable and I had to bail out and restart X11. Viewnior substituted nicely and doesn't have this problem. (I also found it with Hugin when I was trying to find something to view the equirectangular panoramas taken with my Ricoh Theta cameras. Eventually I hacked FreePV into building and that substitutes nicely as well. I gave it an entry in /usr/share/applications so that I could directly view images from the GNOME file manager.)

Symlinking xdg-open to open means I can still open most things from the command line with the same OS X command.

I had a dickens of a time getting my Android Pixel XL to talk to the T2. GNOME kept throwing MTP errors when it tried to mount it (yes, the phone was in MTP mode). gphoto2 could list the directories in PTP mode, but couldn't actually transfer anything. Even adb shell would disconnect after just a few commands, and adb pull wouldn't even get past enumerating the file list. Eventually I found some apocryphal note that someone fixed their phone by connecting it over USB 2.0 instead of USB 3.0, so I found a USB 2.0 hub, plugged that in, and plugged the Pixel XL into that. Now it works. (The same cable works fine at USB 3.0 speeds on my MacBook Air with the Pixel, so I am assuming this is an issue with the chipset in the T2. But I also think that's Google's bug, not Raptor's or Fedora's.)

The freely distributable Linux fonts are improving but still suck, so I transferred my entire font folder over AFP from the G5. The OTF and TTF fonts worked immediately, and Fondu easily converted the DFONT fonts, but I also have a lot of old Mac fonts and even some font suitcases that are pure resource forks. GNOME doesn't know how to transfer those. I could have tried copying them to a FAT volume and extracting the resource forks from the hidden directory, but that seemed icky. After some thought, I went to the Terminal on the G5 and did this instead:

The little snippet of Perl there preserves embedded spaces in the filenames. When I ran it, it turned all the font resources into MacBinary, I transferred the resulting files to the T2, and Fondu converted them as well. Now I have my fonts.

Speaking of fonts, I wanted to play with font hinting on an individual font basis and installed Fonts Tweaks for GNOME to do this. I started with my converted Lucida Grande that I use for most of the display theme, and it eliminated the anti-aliasing and made much of the display painfully jaggy. I couldn't for the life of me figure out how to undo it, even after uninstalling Fonts Tweaks, until I found .config/fontconfig/conf.d/ and removed the offending entry.

I was able to get my RadioSHARK to play audio, but it required a little setup first. I first had to compile the shark tool for Linux, which fortunately I had preserved as part of radioSH. That worked with little adjustment, but the new tool kept throwing permissions errors and I didn't really want to run it as root. The problem is that it uses libhid to talk to the HID portion of the RadioSHARK and libhid expects to be able to enumerate any USB device it encounters to see if it's really a HID. Eventually I hit on this udev recipe and added myself to a new usb group:

SUBSYSTEM=="usb", GROUP="usb", MODE="0660"

Now shark can control it, and I can listen in VLC (using a URL like pulse://alsa_input.usb-Griffin_Technology__Inc._RadioSHARK-00.analog-stereo).

Deadpool 2 is pretty funny. Not as good as the first, because there was no way it was going to out-outrageous it, and the gags were a little forced sometimes, but Brolin made a solid Cable and Domino was outstanding. And hey, a Kiwi kid who can act and isn't stereotyped! I even recognized them driving along the Crowsnest Highway (BC 3) near Manning Park before the merge with Trans-Canada 1 in Hope, a beautiful road I have driven many times myself. Plus, the mid-credits scenes made up for all of the movie's holes (he said, carefully avoiding the joke), and be sure to stay put for a musical laugh-out-loud moment at the very end!

Back on the G5 tomorrow for more work on date-time pickers for TenFourFox FPR8. I've abandoned CSS grid again for the time being because my current implementation is actually worse than not trying to render it at all. That's discouraging, but at least it gracefully degrades.

(*) Given that no changes were made to the HPTE format to get KVM-PR to work for OS X, it may not even be necessary to run the kernel with radix mode off. I'll try that this coming weekend at some point.

For whatever this info might be worth… qemu-system-ppc running on my 2016 MacBook Pro (so obviously, slow emulation mode) boots 10.4.11 no problem, and runs TenFourFox (even the 7400 version) also with no crashes or errors. Everything is very slow, but it all works. So the qemu bug you're hitting even in emulation mode must be POWER-related.