Matthew Garrett

Samsung laptop bug is not Linux specific

I bricked a Samsung laptop today. Unlike most of the reported cases of Samsung laptops refusing to boot, I never booted Linux on it - all experimentation was performed under Windows. It seems that the bug we've been seeing is simultaneously simpler in some ways and more complicated in others than we'd previously realised.

So, some background. The original belief was that the samsung-laptop driver was doing something that caused the system to stop working. This driver was coded to a Samsung specification in order to support certain laptop features that weren't accessible via any standardised mechanism. It works by searching a specific area of memory for a Samsung-specific signature. If it finds it, it follows a pointer to a table that contains various magic values that need to be written in order to trigger some system management code that actually performs the requested change. This is unusual in this day and age, but not unique. The problem is that the magic signature is still present on UEFI systems, but attempting to use the data contained in the table causes problems.

We're not quite sure what those problems are yet. Originally we assumed that the magic values we wrote were causing the problem, so the samsung-laptop driver was patched to disable it on UEFI systems. Unfortunately, this doesn't actually fix the problem - it just avoids the easiest way of triggering it. It turns out that it wasn't the writes that caused the problem, it was what happened next. Performing the writes triggered a hardware error of some description. The Linux kernel caught and logged this. In the old days, people would often never see these logs - the system would then be frozen and it would be impossible to access the hard drive, so they never got written to disk. There's code in the kernel to make this easier on UEFI systems. Whenever a severe error is encountered, the kernel copies recent messages to the UEFI variable storage space. They're then available to userspace after a reboot, allowing more accurate diagnostics of what caused the crash.

That crash dump takes about 10K of UEFI storage space. Microsoft require that Windows 8 systems have at least 64K of storage space available. We only keep one crash dump - if the system crashes again it'll simply overwrite the existing one rather than creating another. This is all completely compatible with the UEFI specification, and Apple actually do something very similar on their hardware. Unfortunately, it turns out that some Samsung laptops will fail to boot if too much of the variable storage space is used. We don't know what "too much" is yet, but writing a bunch of variables from Windows is enough to trigger it. I put some sample code here - it writes out 36 variables each containing a kilobyte of random data. I ran this as an administrator under Windows and then rebooted the system. It never came back.

This is pretty obviously a firmware bug. Writing UEFI variables is expressly permitted by the specification, and there should never be a situation in which an OS can fill the variable store in such a way that the firmware refuses to boot the system. We've seen similar bugs in Intel's reference code in the past, but they were all fixed early last year. For now the safest thing to do is not to use UEFI on any Samsung laptops. Unfortunately, if you're using Windows, that'll require you to reinstall it from scratch.

Does replacing the laptop's hard disk allow it to boot? (I assume you tested and it doesn't help— I'm just thinking of my old MacBook that was never bricked, but had difficulty booting with certain bit patterns on the internal disk.)

Great writeup as always. Can you comment on the rumors going around that removing the CMOS NVRAM battery will make the board bootable again? Obviously that would mean taking apart the laptop to get at the motherboard, which voids the warranty. But for testing the fix, a developer might be willing to give up their warranty to iterate faster.

https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557/comments/23 on https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 is one example of this.

Why are people buying Samsung when they could have Apple hardware? Do they not realize that (if you must) you can run Windows on a Mac?

Seriously. If it's price you are worried about, I'll gladly sell you a brick of wood for a VERY good price if you are willing to believe that it's an equivalent piece of hardware and the only difference is price. You get what you pay for, people.

OK was that predictable enough? Well I'm sorry about that. But come on, when are people going to learn?

I'm actually running linux (Mageia2 to be specific) on a macbook pro. I buyed it to try triple boot, but soon discovered that windows runs better virtualized, and osx is not worth the pain on free software.

This doesn't make the bug any less disastrous, but at least story titles will now shift from "Samsung laptops bricked by Linux" to "Samsung laptops bricked by buggy UEFI".

Regarding Windows, I recently discovered that you can actually migrate a Windows 8 install from MBR+BIOS to GPT+UEFI. It's not straightforward, but it's possible (using bootrec and bcdboot). I haven't tried the other way around or with Windows 7, but I think that should be doable as well.

I'd gladly buy a retina macbook pro if their keyboards weren't castrated, and the trackpad didn't suck (and there's also the slight problem with the Windows drivers provided by Apple - I've had macbooks with bootcamp Windows go in BSOD loops because the trackpad driver killed the system during bootup, and the driver was installed in such a way that it tried to load even in safe mode, making recovery even harder).

You can use Paragon's Migrate to UEFI to shift a Windows install from MBR+BIOS to GPT+UEFI (doesn't officially support Windows 8, and Windows 8 will block its launcher from running, but you can just run explauncher.exe it installs as Administrator, and it'll work - tested on my own install).

Matthew, I had a similar issue some time ago regarding notebooks that got "bricked" by a bug in the proprietary nVidia driver. The issue that time was that the driver somehow managed to overwrite the EDID data of the display with garbage. So the driver failed to load and on the next start the BIOS failed to load as well because the video BIOS couldn't identify the display and stopped at that problem.
So it would be interesting to know what really gets broken when that bug occurs. Does Samsung know what is dead and how to fix it yet?
In my case the bug could be fixed by removing the display and booting with an external monitor. Then reconnect the display and flash the correct data back to the display.
Dunno if something like that would help here. Samsung should probably know. The question is if they tell you?

I broke my MacBook, and I think this is a similar thing, ie mbr and Samsung driver does not work together. Apple installed to be used a lot of Samsung components so in this caseSamsung's case, the problem is also the apple of problems.

...or buy a laptop from any other manufacturer. Seriously, why pay the Apple premium? I don't dispute the fact that they get most of the user experience right, but their hardware is ridiculously expensive and castrated.

That doesn't say much - I've seen Synaptics, Alps and Elantech trackpads used in Lenovo notebooks (some with the stick, some without). I have very little experience with Alps and Elanatech, but I've never had any problems with Synaptics.I haven't come across that many Macbooks with Windows bootcamp, but I've experienced trackpad-related BSODs with a significant number of them (don't remember how many, but I did have to delete applemtp.sys 4 times so far to get the machine to boot at all).

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at CoreOS. Member of the Linux Foundation Technical Advisory Board and the Free Software Foundation board of directors. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.