What sucks, who sucks and you suck

Sorry, Was That Your Process?

2004-10-08

I’d only tangentially heard of the sinister “Out Of Memory Killer” in the 2.6 Linux kernel series, until a few nights ago when my system experienced its devastating efficiency.

It turns out that, as well as a mechanism for killing “random” processes when the system runs out of memory (remember: it’s not a bug, it’s a feature), 2.6.8 also has a gaping memory leak triggered while burning audio CDs. This is not a happy combination, even when the price of blank CD-Rs is negligible.

You start writing the CD. Around the fourth track, most of your 512Mb RAM has been swallowed up outside of userland. The OOM Killer kicks in and zaps the biggest hog it can find in userland: Mozilla. But the CD writing process demands more sacrifices, more bytes on the pyre. So the OOM Killer goes to work with a vengeance, slaying apparently random desktop processes - GIMP sessions with unsaved images, editors with unsent emails, etc. Eventually, it kills something that locks up your entire session, and shortly after that it causes your PC to reboot. (“We had to burn down the entire village to save it.”) You toss the incomplete CD-R in the bin and start again.

Eventually, when your bin fills up with useless silver discs, you get a bit smarter than the dope who invented the OOM Killer and made it the default - you check Red Hat’s Bugzilla system, where you learn all about the million and one CD burning bugs in the “production” kernel update you’re running (e.g. #131251, #132180).

You swear and revert to 2.6.5 or something else less buggy … well, differently buggy. You swear some more, but reflect that at least it only cost you a few CD-Rs, rather than an entire system - like when you almost dragged the PC outside and beat it to death with a lump hammer. You resolve to burn CDs in dummy mode first from now on. Like you used to before you were lulled into believing the process was reliable.

LWN carries as good an explanation of the thinking behind the OOM Killer as I’ve seen and crucially, a hint on how to disable it. Running out of memory isn’t pretty on any Unix system and you would expect some instability to result (of course, it helps if you don’t run out during simple, common, proven tasks). Funnily enough, I’ve been able to resolve this problem on Solaris before now, without the benefit of an OOM Killer. In fact, I’d rather deal with the instability myself than receive automated help in the form of an algorithm that can’t possibly know which, if any, processes are expendable. I look forward to the day when Linux addresses disk space shortages by randomly deleting files. No wait, I mean I look forward to the day I don’t act as a beta-tester for my OS anymore.