There are a few kernel parameters which can be safely added to the Tails boot command line which increase security at little to no cost, and some of which improve security pretty noticably. Here I present a few kernel parameters which can improve the security of Tails against kernel exploits, their rational, and rough cost in terms of performance, compatibility, or memory footprint. I have been adding these to Tails each time I boot manually for around a year on various machines and have never had any problem with any of them. I hope you'll consider utilizing them to harden Tails from kernel exploits. If any additional information is needed on any of the options, I will be happy to do more research into them and provide relevant kernel code snippets if necessary.

slab_nomergeDisables the merging of slabs of similar sizes. Many times some obscure slab will be used in a vulnerable way, allowing an attacker to mess with it more or less arbitrarily. Most slabs are not usable even when exploited, so this isn't too big of a deal. Unfortunately the kernel will merge similar slabs to save a tiny bit of space, and if a vulnerable and useless slab is merged with a safe but useful slab, an attacker can leverage that aliasing to do far more harm than they could have otherwise. In effect, this reduces kernel attack surface area by isolating slabs from each other. The trade-off is a very slight increase in kernel memory utilization. "slabinfo -a" can be used to tell what the memory footprint increase would be on a given system.

slub_debug=FZEnables sanity checks (F) and redzoning (Z). Sanity checks are self-evident and come with a modest performance impact, but this is unlikely to be significant on an average Tails system. The checks are basic but are still useful both for security and as a debugging measure. Redzoning adds extra areas around slabs that detect when a slab is overwritten past its real size, which can help detect overflows. Its performance impact is negligible. I did consider adding the P value which enables poisoning. Poisoning writes an arbitrary value to freed objects, so any modification or reference to that object after being freed or before being initialized will be detected and prevented. This prevents many types of use-after-free vulns at little perf cost. Unfortunately, the default poison value points into userland and might make exploitation easier on systems without SMAP (aka most systems), so I excluded the P. I'll look into it more to see if the trade-offs (increased vulnerability to dereferencing into userland memory in exchange for increased resistence to UAFs) are worth it, but until then I left it out to be safe. An additional note: any time slub_debug= is put in the kernel command line, slab_nomerge is implied. But having slab_nomerge explicitely declared can help prevent regressions where disabling of debugging features is desired but re-enabling of merging is not.

vsyscall=noneVirtual syscalls are the obsolete predecessor of vDSO calls. Unfortunately, both vsyscall=native and vsyscall=emulate (the default) have a negative security impact, with the latter a little less so. Namely, they provide a target for any attacker who has control of the return instruction pointer, which is increasingly common these days now that attackers need to resort to ROP and similar attacks which target a process' control flow. The impact of this is with reduced compatibility, however only legacy statically compiled binaries and old versions of glibc used vsyscalls. All software on modern Tails uses vDSO instead. If for some reason a program does try to use a vsyscall, the process will crash with a memory access violation, and won't bring the whole system down.

mce=0Mostly useful for systems with ECC memory, setting mce to 0 will cause the kernel to panic on any uncorrectable errors detected by the machine check exception system. Corrected errors will just be logged. The default is mce=1, which will SIGBUS on many uncorrected errors. Unfortunately this means malicious processes which try to exploit hardware bugginess (such as rowhammer) will be able to try over and over, suffering only a SIGBUS at failure. Setting mce=0 should have no impact. Any hardware which regularly triggers a memory-based MCE is unlikely to even boot, and the default is 1 only for long-lived servers.

oops=panicSets the kernel to fail-fast, which is highly desirable from a security-perspective (see https://en.wikipedia.org/wiki/Fail-fast for an extremely useful and succinct explaination which provides very useful reasoning). Many kernel exploits hit the kernel hard and fail many times before finally hitting the sweet spot and gaining full control over kernel space. A large percentage of these times, the failures result in a kernel oops, rather than a kernel panic. Setting oops=panic will trigger a true stop error instead. This may be problematic for machines using very buggy drivers which cause harmless oopses. These systems will simply crash. I think this is very unlikely on a Tails system though. oops=panic can also be set as a sysctl, which may be preferable because it could also allow a few other panic_on_* features to be enabled which for some reason do not have their own kernel parameters, such as panic_on_warn, panic_on_unknown_nmi, and panic_on_io_nmi. There's also panic_on_oom which might be useful to prevent the system from locking up when memory pressure is high and not responding to a yanked out USB stick, but that's another discussion...

Summary: slab_nomerge slightly increases memory footprint, but this shouldn't matter for Tails because it's not an embedded system. slub_debug=FZ increases memory footprint slightly, and has a moderate performance impact in benchmarks, but is unlikely to have any impact in the real world. Remove the "F" to remove the majority of that perf impact. vsyscall=none breaks very old apps but Tails uses none of these anyway. mce=0 prevents malicious programs from trying to exploit hardware bugs by giving them only one shot at it. oops=panic causes the system to fail-fast, which is desirable from a security perspective. Systems with very buggy drivers may crash with this option set.

Additional options I am looking into are reboot=cold (may make certain types of cold-boot attacks harder if memory is not removed from the system), acpi=copy_dsdt (may harden the system slightly from buggy BIOSes), and elevator=deadline (might reduce kernel surface area, with a nice side effect of improving USB and SSD performance). I may post rational for them as well if they turn out to be useful security-wise.

History

First of all, thank you for this nicely documented issue. I agree with every points, except the latest one (oops=panic). On recent/barely-supported hardware, oops are happening frequently. I don't think that we should panic on them, since this will prevent users with bleeding-edge computers to use Tails at all.

That's too bad. Do the people who get oopses from bleeding-edge hardware tend to get them immediately, or are they delayed or appear at random intervals? I think a nice, albeit slightly hacky compromise would be to have kernel.panic_on_oops=0 upon boot, and then set kernel.panic_on_oops=1 after the GNOME desktop starts up if the kernel has not oopsed by then. That would opportunistically provide a security improvement on supported hardware.

Anyway I'm in the process of benchmarking the various boot options on a spare laptop using unixbench. I'll post the results for the following boot combinations when it finishes: "toram", "toram slab_nomerge", "toram slub_debug=F", "toram slub_debug=Z", "toram slub_debug=FZ", and "toram slab_nomerge slub_debug=FZ vsyscall=none mce=0 oops=panic". I used toram so the benchmark data would not be skewed by a cheap low-speed flash drive.

Does the Tails testing suite do any benchmarking to detect performance regressions or anything, or does it purely exercise Tails functionality?

Also, is anyone on the Tails team experienced with kernel exploitation? I'm not familiar enough with it myself to say for certain whether or not slab poisoning harms security more than it benefits it, for the reasons outlined in the original post.

On my shiny bleeding-edge work's laptop, I've got a lot of oops due to the graphic card, in a regular fashion, during boot, and usage.

Does the Tails testing suite do any benchmarking to detect performance regressions or anything, or does it purely exercise Tails functionality?

I don't think that there is any benchmark in the testsuite yet :/

Also, is anyone on the Tails team experienced with kernel exploitation? I'm not familiar enough with it myself to say for certain whether or not slab poisoning harms security more than it benefits it, for the reasons outlined in the original post.

Slab poisoning from vanilla kernel isn't designed as a hardening mitigation, and will make it easier to get a working exploit (without SMAP/SMEP) since the poisonous value points to userspace. I wouldn't recommend to enable it.

Slab poisoning from vanilla kernel isn't designed as a hardening mitigation, and will make it easier to get a working exploit (without SMAP/SMEP) since the poisonous value points to userspace. I wouldn't recommend to enable it.

Yeah that's what I mentioned in the original post. But it does make UAFs harder, even if it was not designed intentionally to do so. I guess with trade-offs such as this, it's better to be conservative and assume that changes would be for the worse. If only the poisoning value could be specified in a sysctl. Well, whenever the overlayfs+AppArmor problem is resolved and grsecurity is added (I'm not holding my breath...), it'll be a moot point.

I finished benchmarking it. Between the default settings and the extra kernel boot parameters, the performance changes are statistically insignificant for the most part. I've also attached a txz of the raw benchmark data. Does this seem like an acceptable perf impact? If so, I've attached a patch to do that. I hope I did the patch correctly.

If so, I've attached a patch to do that. I hope I did the patch correctly.

Thanks! Looks good. I've imported it into a Git branch, and will build + run our automated test suite on it, so we'll see if it breaks anything :)

It's too late for inclusion in our next major release (2.2), so the target is the one after, that is 2.4 (in ~3 months).

Bonus points if someone imports the discussion from this ticket into our design doc, on a Git branch based on the one I'm referencing here: it would be nice if the thinking process that leads to these changes was recorded in Git.

Good news regarding slab poisoning. With kernel 4.6, it will be possible to set the value to zero which clears the memory, so there will no longer be the problem of a poison value pointing into userspace. It may be a while before Tails is using a version of Debian that uses 4.6, but it's something to keep in mind so we can make use of it in a timely fashion.

Page poisoning has traditionally been a kernel debugging feature; it fills freed pages with a special pattern that is easy to spot when looking for things that went wrong. In 4.6, poisoning can be enabled independently of the debugging options, and the "poison" value can be set to zero; this results in pages being simply cleared when they are freed. This behavior, inspired by the grsecurity/PaX patches, reduces the chances of the kernel leaking sensitive data.

Good news regarding slab poisoning. With kernel 4.6, it will be possible to set the value to zero which clears the memory, so there will no longer be the problem of a poison value pointing into userspace. It may be a while before Tails is using a version of Debian that uses 4.6, but it's something to keep in mind so we can make use of it in a timely fashion.

Page poisoning has traditionally been a kernel debugging feature; it fills freed pages with a special pattern that is easy to spot when looking for things that went wrong. In 4.6, poisoning can be enabled independently of the debugging options, and the "poison" value can be set to zero; this results in pages being simply cleared when they are freed. This behavior, inspired by the grsecurity/PaX patches, reduces the chances of the kernel leaking sensitive data.

Cool! Can you please track that in a new, dedicated ticket? This one will be most likely be resolved, thanks to your initial batch of proposals, in Tails 2.4.