Bohrbugs: OpenOffice.org won’t print on Tuesdays

This is one of my favorite bugs of all time: Ubuntu bug #248619, where OpenOffice.org won’t print to Brother printers on Tuesdays (but works on other days of the week).

Read some of the duplicate reports to follow the analysis and developer/user cooperation which isolated the bug.

It’s a great example of a Bohrbug, where the circumstances which trigger the problem can be very difficult to isolate. It’s likely that many such bugs exist in Ubuntu and other software today, but have not yet been isolated, as bug 248619 has been.

We’ve all observed a complex software stack misbehaving in ways we would never expect. It’s just as confusing when things suddenly start working again, for no apparent reason. We start to doubt our senses, or the person who is reporting their observations.

In The Psychology of Computer Programming (chapter 5), Jerry Weinberg presents a case where two identical systems, physically isolated from each other but running the same software, exhibited precisely the same error at the same time. This obviously pointed to a software bug, but after two weeks of searching, the problem could not be replicated and the root cause was not found. The team gave up on finding it, and the system went into production, only to have the bug recur and cause a serious operational outage.

Bugs which occur very rarely may not always be worth investigating, but they are real, and can be explained. When it really matters, we should remember not to disregard them.

That duplicate bug was in the Mercury Program tracking system. Since human life was at risk, our rules said the we could not launch with any unexplained bugs remaining, but project management caved to political pressure and ordered that we declare the bug a “coincidence” and proceed with the launch

What actually happened was several of us went against management orders to drop the search. We kept searching and found the bug just before John Glenn’s first flight. If it had not been corrected, John Glenn might very well have died because of it.

I figured that after 50 years, we could finally shed the anonymity, so the story would have its full significance. Then, Mercury was a rare system, but half a century later, life-critical systems are far more frequent.

And even if it’s only a billion dollars or so that’s at stake, tracking down these “obscure” bugs is still important, I think.

Maybe if we keep publishing these stories, some people won’t have to learn the hard way. That’s why I’m now writing novels that show these consequences in dramatic form. Maybe people will learn these lessons for a “tuition” less than a billion dollars or a human life.

No, it was a bug introduced unintentionally in file(1), a program used to guess the type of a file based on its contents. The file in question contained a date, and when the date included the string “Tue”, this was close enough to confuse the program into thinking it was a different type of file. The type database contained a pattern which was too general.

As someone who writes software I readily admit to a low tolerance for buggy software – they piss me off.

In the last six years or so, the most upset I get is while using Open Office. It’s absolutely amazing how many basic usability issues that POS program has. It’s pitiful! It’s just barely competent, IMO.

Makes you want to post nasty rants really giving it to the monkeys responsible, but then you remember it’s open source.

“We start to doubt our senses, or the person who is reporting their observations.”

it’s my experience that a developers _first_ instinct is to doubt the person reporting the issue. then again, i work QA at a for-profit software company, so most developers are there for the paycheck rather than a love of the product.