The Chernobyl Design Pattern

In 1994, the world learned that the Intel Pentium chip had a bug. In certain cases it gave the wrong answer when calculating floating-point division. These cases were rare, only 1 in 9 billion divisions, and typically only resulted in errors past the 8th decimal place.

What did Intel do about this? Well, there was denial at first, and then dismissal of the problem as being trivial and unimportant. But eventually they saw the light and offered a no-questions-asked replacement policy for defective processors. No doubt this was expensive for Intel, but this preserved their good name and reputation.

It could have been different. For example, they could have simply kept the bug. They could have preserved that bug in future versions of the Pentium for backwards compatibility, arguing that there was some software out there that may have worked around the original defect, and for Intel to fix the bug now would only break the software that worked around the bug. This is a dangerous line of reasoning. What bug can’t be excused by that argument?

Intel could have further decided to turn their bug into a standard, and get it blessed by a standards development organization and maybe even ISO. “It’s not a bug, it’s a standard”.

But Intel is not Microsoft, so they don’t have quite the audacity to turn a bug into a standard, which is what Microsoft is attempting to do by declaring in Office Open XML (OOXML) that the the year 1900 should be treated as a leap year, in contradiction of the Gregorian Calendar which has been in use almost 500 years. (Years divisible by 100 are leap years only if they are also divisible by 400)

By mandating the perpetuation of this bug, we are asking for trouble. Date libraries in modern programming languages like C, C++, Java, Python, Ruby all calculate dates correctly according to the Gregorian Calendar. So any interpretation of dates in OOXML files in these languages will be off by one day unless the author of the software adds their own workaround to their code to account for Excel’s bug. Certainly some will make the “correction” properly, at their own expense. But many will not, perhaps because they did not see it deep within the 6,000 page specification.

There is something I call the “Chernobyl Design Pattern”, where you take your worst bug, the ugliest part of your code, the part that is so bad, so radioactive that no one can touch it without getting killed, and you make it private and inaccessible, and then put a new interface around it, essentially entomb it in concrete so that no one can get close to it. In other words, if you can’t fix it, at least contain the damage, prevent it from spreading.

Microsoft has taken another approach here. Instead of containment, they are propagating the bug even further. We need to think beyond Excel and think as well of other applications that work with OOXML data, and other applications that work with those apps and so on, the entire network of data dependencies. The mere existence of this bug in a standard will lead to buggy implementations, poor interoperability, and general chaos around dates. The fallout of this bug should have been contained within the source code of Excel. For this to leak out, into a specification, then a standard and then into other implementations, contradicting both the civil calendar and every other tool that deals with dates, will pollute the entire ecosystem.

To be fair, this is not a bug in Excell. It was an intentional design decision to ensure compatbaility with Lotus 1-2-3 files.

The goal of this standard is to ensure that an accurate representation of pre-existing documents is possible in a format that non-MS companies can work with. It is not to create an idealized format free of past mistakes.

As for the Intel bug, it wasn’t backwards compatable with previous versions.

Jonathan, type =WEEKDAY(“1/1/1900”) into Excel. What do you get? It returns 1, meaning Sunday. Now look at any reputable calendar created since Pope Gregory XIII. What day of the week was January 1st, 1900? The correct answer is Monday. So yes, Excel has a bug, and yes Microsoft has pushed to include this bug in an International Standard. To say this is not a bug is pure denial.

And why Microsoft believes that all these legacy artefacts must be included in a ‘new’ format is beyond me. It ought to be the task of the converter to handle these idiosyncrasies in the first place. Old versions of the MS Office package can’t read OOXML anyway.

Isn’t the more serious problem the fact that you can’t use a date before 1900 within a formula? It’s pretty bad to include a bug in the spec, but to make it impossible to use dates before that date, but allow dates up to 9999 seems strange at the least….

I find it amazing that MS thinks that this is acceptable. If someone came to me with the idea of making a bug and something quite clearly wrong as a standard I would laugh in their face. Bugs need fixing, or containing. To say it is to fix a bug in lotus 1-2-3 is a bigger joke. MS has a bug, to fix a bug in someone elses software. Reject it. Don’t work around it. Then they have to fix their bug and not pass responsibility on to someone else.

@Andres : that’s just a way to prevent us from lingering on the past too much ;)

@johnatan/rob: interesting take on this issue, so, to summarise: Microsoft decided to incorporate a 3d party bug into their software as intentional design. I’m not surprised.
I know Microsoft will go through some lengths to keep their corporate consumer base happy (which ecplipses the home-user and gamer base combined). See the atrocities they commit in their browsers to keep a semblance of backwards compatability based on IE6 intranet requirtements. It’s a strange poisiont they’re in. On the one end they need to be innovative on a competitive level, on the other end their largest userbase resides there where upgrading IT-infrastructure is regarded with suspicion and fear (and severe budget consequences).

I think MS should indeed take this chernobyl design pattern to heart, it could serve them well in keeping both those sides from tearing them in two..

I find it amusing that this bad decision has been causing software design headaches since 1992, and continues to do so.

Why the Excel team chose to propagate this bug into the data file, rather than re-encode the date, I have no idea. What’s defensible in 640k and 1980s document technology becomes steadily less defensible in the 1990s, and just ridiculous in the 2000s.

I think there are several allusions we can bring out here. Chernobyl, by dint of removing the humans, became a sanctuary for wildlife, much like the DMZ. One lurks, for example, in “For the love of God, Montresor!” (“Yes, for the love of God.”)