There's been a few links to this story of someone buying a system that turned out to have UEFI firmware and also turned out not to boot. Given all the press, it's unsurprising that people would assume that problems they have with UEFI booting are related to Secure Boot, but it's very unlikely that this is the actual problem here. First, nobody's shipping an appropriately signed operating system yet. A hardware vendor that enabled secure boot out of the box would be selling a machine that wouldn't boot any OS you could buy. That's a poor way to make money. Second, the system booted a Fedora 17 CD. Fedora 17 isn't signed, so if the firmware booted it then the firmware isn't enforcing Secure Boot. Third, it didn't boot the installed OS. That's really at the point of it sounding like a hardware problem - selling systems that don't run the OS you sold them with is a guaranteed way of getting enough support calls that you wouldn't make any money on them, ever.

To be fair, Linux compatibility with UEFI systems is still not as good as it is with BIOS systems. Fedora 18 will be using a new UEFI boot process and so far in our testing it's been significantly more reliable than Fedora 17. There's still some remaining issues that we're aware of and working on, but right now it's hugely more likely that failures to boot Fedora 17 on UEFI systems are down to our bugs rather than Secure Boot.

The biggest objection that most people have to UEFI Secure Boot as embodied in the Windows 8 certification requirements is the position of Microsoft as the root of trust. But we can go further than that - putting any third party in a position of trust means that there's the risk that your machine will end up trusting code that you don't want it to trust. Can we avoid that?

It turns out that the answer is yes, although perhaps a little more complicated than ideal. The Windows 8 certification requirements insist that (on x86, at least) the key databases be completely modifiable. That means you can delete all the keys provided by your manufacturer, including Microsoft's. However, as far as the spec is concerned, a system without keys is in what's called "Setup Mode", and in this state it'll boot anything - even if it doesn't have a signature.

So, we need to populate the key database. This isn't terribly difficult. James Bottomley has a suite of tools here, including support for generating keys and building an EFI binary that will enrol them into the key databases. So, now you have a bunch of keys and the public half of them is in your platform firmware. What next?

If you've got a system with plug-in graphics hardware, what happens next is that your system no longer has any graphics. The firmware-level drivers for any plug-in hardware also need to be signed, and won't run otherwise. That means no graphics in the firmware. If you're netbooting off a plug-in network card, or booting off a plug-in storage controller, you're going to have similar problems. This is the most awkward part of the entire process. The drivers are all signed with a Microsoft key, so you can't just enrol that without trusting everything else Microsoft have signed. There is a way around this, though. The typical way to use Secure Boot is to provide a list of trusted keys, but you can also provide trusted hashes. Doing this involves reading the device ROM, generating a SHA256 hash of it and then putting that hash in the key database.

This is, obviously, not very practical to do by hand. We intend to provide support for this in Fedora by providing a tool that gets run while the system is in setup mode, reads the ROMs, hashes them and enrols the hashes. We'll probably integrate that into the general key installation tool to make it a one-step procedure. The only remaining problem is that swapping your hardware or updating the firmware will take it back to a broken state, and sadly there's no good answer for that at the moment.

Once you've got a full set of enrolled keys and the hashes of any option ROMs, the only thing to do is to sign your bootloader and kernel. Peter Jones has written a tool to do that, and it's available here. It uses nss and so has support for using any signing hardware that nss supports, which means you can put your private key on a smartcard and sign things using that.

At that point, assuming your machine implements the spec properly everything that it boots will be code that you've explicitly trusted. But how do you know that your firmware is following the spec and hasn't been backdoored? That's tricky. There's a complete open implementation of UEFI here, but it doesn't include the platform setup code that you'd need to run it on any x86 hardware. In theory it'd be possible to run it on top of Coreboot, but right now that's not implemented. There is a complete port to the Beagleboard hardware, and there's instructions for using it here. The remaining issue is that you need a mechanism for securing access to the system flash, since if you can make arbitrary writes to it an attacker can just modify the key store. On x86 this is managed by the flash controller being switched into a mode where all writes have to be carried out from System Management Mode, and the code that runs there verifies that writes are correctly authenticated. I've no idea how you'd implement that on ARM.

There's still some work to be done in order to permit users to verify the entire stack, but Secure Boot does make it possible for the user to have much greater control over what their system runs. The freedom to make decisions about not only what your computer will run but also what it won't is an important one, and we're doing what we can to make sure that users have that freedom.

I've temporarily got hold of a Retina Macbook Pro. Fedora 17 boots fine on it with the following kernel arguments: nointremap drm_kms_helper.poll=0 video=eDP-1:2880x1800@45e, although you'll want an updated install image. The nointremap requirement is fixed by this patch, and the others are being tracked here. The gmux chip that controls the multiple graphics chipsets has changed - patches here. That just leaves Thunderbolt.

Thunderbolt is, to a first approximation, PCIe tunnelled over a Displayport connector. It's a high-bandwidth protocol for connecting external devices. Things are complicated by it also supporting Displayport over the same connection, which I'm sure will be hilarious when we get yet another incompatible display protocol. But anyway. I picked up a Thunderbolt ethernet adapter and started poking.

Booting with the device connected looked promising - an ethernet device appeared. Pulling it out resulted in the pcie hotplug driver noticing that it was missing and cleaning up, although the b44 ethernet driver sat around looking puzzled for a while. So far so good. Unfortunately plugging it in again didn't work, and rebooting without any Thunderbolt devices connected not only did nothing on hotplug, it also meant that the Thunderbolt controller didn't show up in lspci at all.

It turns out that there's several problems at play here. The first is the missing Thunderbolt controller. The problem here is the following code in Apple's ACPI tables:

This checks if the system claims to be Darwin (the core of OS X). If not, it checks whether two PCIe bridges have been configured. If both are still in an unconfigured state, it cuts the PCIe link to the upstream PCIe link and then sets GP23 to 0. Looking further, GP23 turns out to be a GPIO line. Setting it to 0 appears to cut power to the Thunderbolt controller. In summary, unless your OS claims to be OS X, booting without an attached Thunderbolt device will result in the chipset being powered down so hard that it's entirely invisible to the OS.

Fixing that is as easy as adding Darwin to the list of operating systems Linux claims to be. So, now I could boot without a connected device and still see the controller. Since ACPI seemed to be important here, I started looking at the other ACPI methods associated with the device. The first one that drew attention was Name(_GPE, 0x14). The _GPE method (not to be confused with the _GPE object at the top of the global ACPI namespace) is defined as providing the ACPI general purpose event associated with the device. Unfortunately, it's only defined as doing this for embedded controllers - Apple's use of it on a PCI device is massively non-standard. But, still, easy enough to handle. Intel chipsets map GPE 0x10 to 0x1f to GPIO lines 0-15, so this GPE is associated with GPIO 4. Looking through the ACPI code showed a bunch of other references to GP04, and it turns out that if the chip is powered down (via setting GP23 to 0) then plugging a device in to the port will generate a GPE. This makes sense - a powered down controller isn't going to notice hotplug events itself, so you need some kind of out of band notification mechanism. There's also another ACPI method that flips the polarity of the GPIO line so it doesn't keep generating events for the entire time a device is connected. It didn't take long to come up with a stub driver that would power down the controller if nothing was connected at boot, and then wake it up again on a hotplug event.

Unfortunately that's as far as I've got. I'd been hoping that everything beyond this was just PCIe hotplug, but it seems not. Apple's Thunderbolt driver is rather too large to just be handling ACPI events, and this document on Apple's website makes it pretty clear that there's OS involvement in events and device configuration. Booting with a device connected means that the firmware does the setup, but if you want to support hotplug then the OS needs to know how to do that - and Linux doesn't.

Getting this far involved rather a lot of irritation at Apple for conspiring to do things in a range of non-standard ways, but it turns out that the real villains of the piece are Intel. The Thunderbolt controller in the Apples is an Intel part - the 82524EF, according to Apple. Given Intel's enthusiasm for Linux and their generally high levels of engagement with the Linux development community, it's disappointing[1] to discover that this controller has been shipping for over a year with (a) no Linux driver and (b) no documentation that would let anyone write such a driver. It's not even mentioned on Intel's website. So, thanks Intel. You're awful.

Anyway. This was my attempt to spend a few days doing something more relaxing than secure boot, and all I ended up with was eczema and liver pain. Lesson learned, hardware vendors hate you even more than firmware vendors do.

There's a post here describing SUSE's approach to implementing Secure Boot support. In summary, it's pretty similar to the approach we're taking in Fedora - a first stage shim loader is signed with a key in db, it loads a second stage bootloader (grub 2) that's signed with a key that's in shim, the second stage bootloader loads a signed kernel. The main difference between the approaches is the use of a separate key database in shim, whereas we are currently planning on using a built-in key and the contents of the firmware key database.

The main concern about using a separate key database is that applications are able to modify variable content at runtime. The Secure Boot databases are protected by requiring that any updates be signed, but this is less practical for scenarios where you want the user to be able to modify the database themselves while still protecting them from untrusted applications installing their own keys. SUSE have solved this problem by not setting the runtime access flag on the variable, meaning that it's inaccessible once ExitBootServices() has been called early in the init sequence. Since we only start running any untrusted code after ExitBootServices(), the variable can only be accessed by trusted code.

It's a wonderfully elegant solution. We've been planning on supporting user keys by trusting the contents of db, and the Windows 8 requirements specify that it must be possible for a physically present user to add keys to it. The problem there has been that different vendors offer different UI for this, in some cases even requiring that the keys be in different formats. Using an entirely separate database and offering support for enrolment in the early boot phase means that the UI and formats can be kept consistent, which makes it much easier for users to manage their own keys.

I suspect that we'll adopt this approach in Fedora as well - it doesn't allow anything that our solution wouldn't have, but it does make some of them easier. Full marks to SUSE on this.

Paolo Bonzini noticed something a little awkward in the Linux kernel support code for Microsoft's HyperV virtualisation environment - specifically, that the magic constant passed through to the hypervisor was "0xB16B00B5", or, in English, "BIG BOOBS". It turns out that this isn't an exception - when the code was originally submitted it also contained "0x0B00B135". That one got removed when the Xen support code was ripped out.

At the most basic level it's just straightforward childish humour, and the use of vaguely-English strings in magic hex constants is hardly uncommon. But it's also specifically male childish humour. Puerile sniggering at breasts contributes to the continuing impression that software development is a boys club where girls aren't welcome. It's especially irritating in this case because Azure may depend on this constant, so changing it will break things.

So, full marks, Microsoft. You've managed to make the kernel more offensive to half the population and you've made it awkward for us to rectify it.

In the past I used to argue that accurate desktop DPI was important, since after all otherwise you could print out a 12 point font and hold up the sheet of paper to your monitor and it would be different. I later gave up on the idea that accurate DPI measurement was generally useful to the desktop, but there's still something seductively attractive about the idea that a 12 point font should be the same everywhere, and someone recently wrote about that. Like many seductively attractive ideas it's actually just trying to drug you and steal your kidneys, so here's why it's a bad plan.

Solutions solve problems. Before we judge whether a solution is worthwhile or not, we need to examine the problem that it's solving. "My 12pt font looks different depending on the DPI of my display" isn't a statement of a problem, it's a statement of fact. How do we turn it into a problem? "My 12pt font looks different depending on which display I'm using and so it's hard to judge what it'll look like on paper" is a problem. Way back in prehistory when Apple launched the original Mac, they defined 72dpi as their desktop standard because the original screen was approximately 72dpi. Write something on a Mac, print it and hold the paper up to the monitor and the text would be the same size. This sounds great! Except that (1) this is informative only if your monitor is the same distance away as your reader will be from the paper, and (2) even in the classic Mac period of the late 80s and early 90s, Mac monitors varied between 66 and 80dpi and so this was already untrue. It turns out that this isn't a real problem. It's arguably useful for designers to be able to scale their screen contents to match a sheet of paper the same distance away, but that's still not how they'll do most of their work. And it's certainly not an argument for enforcing this on the rest of the UI.

So "It'll look different on the screen and paper" isn't really a problem, and so that's not what this is a solution for. Let's find a different problem. Here's one - "I designed a UI that works fine on 100DPI displays but is almost unusable on 200DPI displays". This problem is a much better one to solve, because it actually affects real people rather than the dying breed who have to care about what things look like when turned into ink pasted onto bits of dead tree. And it sounds kind of like designing UIs to be resolution independent would be a great solution to this. Instead of drawing a rectangle that's 100 pixels wide, let me draw one that's one inch wide. That way it'll look identical on 100dpi and 200dpi systems, and now I can celebrate with three lines of coke and wake up with $5,000 of coffee table made out of recycled cinema posters or whatever it is that designers do these days. A huge pile of money from Google will be turning up any day now.

Stop. Do not believe this.

Websites have been the new hotness for a while now, so let's apply this to them. Let's imagine a world in which the New York Times produced an electronic version in the form of a website, and let's imagine that people viewed this website on both desktops and phones. In this world the website indicates that content should be displayed in a 12pt font, and that both the desktop and phone software stacks render this to an identical size, and as such the site would look identical if the desktop's monitor and the phone were the same distance away from me.

The flaw in this should be obvious. If I'm reading something on my phone then the screen is a great deal closer to me than my desktop's monitor usually is. If the fonts are rendered identically on both then the text on my phone will seem unnecessarily large and I won't get to see as much content. I'll end up zooming out and now your UI is smaller than you expected it to be, and if your design philosophy was based on the assumption that the UI would be identical on all displays then there's probably now a bunch of interface that's too small for me to interact with. Congratulations. I hate you.

So "Let's make our fonts the same size everywhere" doesn't solve the problem, because you still need to be aware of how different devices types are used differently and ensure that your UI works on all of them. But hey, we could special case that - let's have different device classes and provide different default font sizes for each of them. We'll render in 12pt on desktops and 7pt on phones. Happy now?

Not really, because it still makes this basic assumption that people want their UI to look identical across different machines with different DPI. Some people do buy high-DPI devices because they want their fonts to look nicer, and the approach Apple have taken with the Retina Macbook Pro is clearly designed to cater to that group. But other people buy high-DPI devices because they want to be able to use smaller fonts and still have them be legible, and they'll get annoyed if all their applications just make the UI larger to compensate for their increased DPI. And let's not forget the problem of wildly differing displays on the same hardware. If I have a window displaying a 12pt font on my internal display and then drag that window to an attached projector, what size should that font be? If you say 12pt then I really hope that this is an accurate representation of your life, because I suspect most people have trouble reading a screen of 12pt text from the back of an auditorium.

That covers why I think this approach is wrong. But how about why it's dangerous? Once you start designing with the knowledge that your UI will look the same everywhere, you start enforcing that assumption in your design. 12pt text will look the same everywhere, so there's no need to support other font sizes. And just like that you've set us even further back in terms of accessibility support, because anyone who actually needs to make the text bigger only gets to do so if they also make all of your other UI elements bigger. Full marks. Dismissed.

The only problem "A 12pt font should be the same everywhere" solves is "A 12pt font isn't always the same size". The problem people assume it solves is "It's difficult to design a UI that is appropriate regardless of display DPI", and it really doesn't. Computers aren't sheets of paper, and a real solution to the DPI problem needs to be based on concepts more advanced than one dating back to the invention of the printing press. Address the problem, not your assumption of what the problem is.

Most x86 devices export various bits of system information via SMBIOS, including the system manufacturer, model and firmware version. This makes it possible for the kernel to alter its behaviour depending on the machine it's running on, usually referred to as "DMI quirking". This is a very attractive approach to handling machine-specific bugs - unfortunately it also means that we often end up working around symptoms with no understanding of the underlying issue.

Almost all x86 hardware is tested with Windows. For the most part vendors don't ship devices that don't work - if you install a stock copy of Windows on a system, you expect it to boot successfully and reboot properly. Basic ACPI functionality should be present and correct, including processor power-saving states. Time should pass at something approximating the real rate. And since stock Windows isn't updated with large numbers of DMI quirk entries, the hardware needs to do this with an operating system that's already shipped.

Which means that if Linux doesn't provide the same level of functionality, it means we're doing something different to Windows. Sometimes this is because we're doing something fundamentally different with the hardware - the HP NX6125, for example, resets its thermal trip points if the timer is set up in a different way to Windows. In this case we've decided that the additional functionality of doing things the Linux way is worth it, and we'll just blacklist the small set of machines that are broken by it.

But other times there's no need for the difference. For years we were triggering system reboots in a different way to Windows and then adding DMI workarounds for any systems that didn't work. The problem with this approach is that it's basically impossible to guarantee that you've found the full set of broken hardware. It's very easy to add another DMI quirk, but it doesn't solve the problem for anyone who bought a machine, tried Linux and gave up when they found reboot didn't work. More recently we've added a pile of quirks for Dells - turns out that in all cases we're working around a bug in their firmware that hangs if we're using VT-d.

We typically don't merge code that fixes a specific example of a problem without at least considering whether there's a wider class of similar problems that could all be solved at once. We should take the same attitude to DMI quirks. Sometimes they're the least harmful way of handling an issue, but most of the time they're just fixing one person's itch and leaving a larger number of people to continue cursing at Linux. There should at least be a demonstration of an attempt to understand the underlying problem before just adding another quirk, and every patch that permits the removal of some DMI checks should be greeted with great cheer.

I spent a bunch of the past week trying to figure out why my UEFI bootloader code was crashing in bizarre ways. It turns out to be a known issue that various people have hit in the past. After a lot of staring (and swearing) I finally got to the point where I could run a debugger against a running UEFI binary under qemu and found that I was entering a function with valid arguments but executing it with invalid ones. After spending a while crying on IRC people suggested that maybe the red zone was getting corrupted, and it turned out that this was the problem.

The red zone is part of the AMD64 System V ABI and consists of 128 bytes at the bottom of the stack frame. This region is guaranteed to be left untouched by any interrupt or signal handlers, so it's available for functions to use as scratch space. For the code in question, gcc was copying the registers onto the stack before executing the function. Something then trod on those registers. This also explained why I wasn't seeing the problem when I added debug print statements - signal and interrupt handlers won't touch it, but other functions might, so it tends to only be used by functions that don't call any other functions. Adding a print statement meant that the compiler left the red zone alone.

But why was it getting corrupted? UEFI uses the Microsoft AMD64 ABI rather than the System V one, and the Microsoft ABI doesn't define a red zone. UEFI executables built on Linux tend to use System V because that means we can link in static libraries built for Linux, rather than needing an entirely separate toolchain and libraries to build UEFI executables. The problem arises when we run one of these executables and there's a UEFI interrupt handler still running. Take an interrupt, Microsoft ABI interrupt handler gets called and the red zone gets overwritten.

There's a simple fix - we can just tell gcc not to assume that the red zone is there by passing -mno-red-zone. I did that and suddenly everything was stable. Frustrating, but at least it works now.

A couple of people have asked me about the Ubuntu ODM UEFI requirements, specifically the secure boot section. This is aimed at hardware vendors who explicitly want to support Ubuntu, so it's not necessarily the approach Canonical will be taking for installing Ubuntu on average consumer hardware. But it's still worth looking at.

In a nutshell, the requirements for secure boot are:

The system must have an Ubuntu key preinstalled in each of KEK and db

It must be possible to disable secure boot

It must be possible for the end user to reconfigure keys

It's basically the same set of requirements as Microsoft have, except with an Ubuntu key instead of a Microsoft one.

The significant difference between the Ubuntu approach and the Microsoft approach is that there's no indication that Canonical will be offering any kind of signing service. A system carrying only the Ubuntu signing key will conform to these requirements and may be certified by Canonical, but will not boot any OS other than Ubuntu unless the user disables secure boot or imports their own key database. That is, a certified Ubuntu system may be more locked down than a certified Windows 8 system.

(Practically speaking this probably isn't an issue for desktops, because you'll need to carry the Microsoft key in order to validate drivers on any PCI cards. But laptops are unlikely to run external option ROMs, so mobile hardware would be viable with only the Ubuntu key)

There's two obvious solutions for this:

Canonical could offer a signing service. Expensive and awkward, but obviously achievable. However, this isn't a great solution. The Authenticode format used for secure boot signing only permits a single signature. Anything signed with the Ubuntu key cannot also be signed with any other key. So if, say, Fedora wanted to install on these systems without disabling secure boot first, you'd need to have two sets of install media - one signed with the Ubuntu key for Ubuntu hardware, one signed with the Microsoft key for Windows hardware.

Require that ODMs include the Microsoft key as well as the Ubuntu key. This maintains compatibility with other operating systems.

This kind of problem is why we didn't argue for a Fedora-specific signing key. While it would have avoided a dependence on Microsoft, it would have created an entirely different kind of vendor lock-in.

Carla Schroder wrote a piece for linux.com on secure boot, which gives a reasonable overview of the technology and some of the concerns. It unfortunately then encourages everyone to ignore those legitimate concerns by making the trivially false claim that secure boot is just security theatre.

Security isn't some magic binary state where you're either secure or insecure and you can know which. Right now I'd consider my web server secure. If someone finds an exploitable bug in Apache then that would obviously no longer be the case. I could make it more secure by ensuring that Apache runs in a sufficiently isolated chroot that there's no easy mechanism for anyone to break out, which would be fine up until there's also a kernel exploit that allows someone to escalate their privileges enough to change their process root. So it's entirely possible that someone could right now be breaking into my system through bugs I don't even know exist. Am I secure? I don't know. Does that mean I should just disable all security functionality and have an open root shell bound to a well known port? No. Obviously.

Secure boot depends on the correctness of the implementation and the security of the signing key. If the implementation is flawed or if control of the signing key is lost then it stops providing security, and understanding that is important in order to decide how much trust you place in the technology. But much the same is true of any security technique. Kernel flaws make it possible for an unprivileged user to run with arbitrary privileges. Is user/admin separation security theatre? SSL certificate authorities have leaked keys. Is it security theatre for your bank to insist that you use SSL when logging in?

Secure boot doesn't instantly turn an insecure system into a secure one. It's one more technology that makes it more difficult for attackers to take control of your system. It will be broken and it will be fixed, just like almost any other security. If it's security theatre, so is your doorlock.

Why is this important? Because if you tell anyone that understands the technology that secure boot adds no security, they'll just assume that you're equally uninformed about everything else you're saying. It's a perfect excuse for them to just ignore discussion of market restrictions and user freedoms. We don't get anywhere by arguing against reality. Facts are important.