Is your software on fire?

January 26, 2008

The spectacle of Dell laptops on fire in the summer of 2006 due to Sony battery problems has prodded me to think about product failures. There is nothing so attention-getting as a fire in a conference room. Few people who see this sort of failure will forget what they have seen.
Software failures may not be so spectacular, but they can be just as memorable to the people who witness them.
Examples from large-scale software systems: if you were waiting for your baggage in the new Denver airport a few years ago, you may have waited until human intervention delivered your bags, because the bag sorting system failed. And what if you dialed 911 and the call did not go through?

Examples from embedded systems: your cell phone drops a call due to a software glitch in the phone; your hard disk loses track of its position and takes an extra several seconds to recover.
Examples from everyday use of an operating system: Windows gets confused while processing interrupts from the web browser, and the browser hangs until you reboot; Outlook misses a beat and an email doesn’t appear on the screen when you expect it to.

If the computer or phone were to catch fire when any one of these failures occurs, you can bet that the manufacturer would do a massive recall the way Dell has done. But they didn’t. Instead, they let the users keep on running with a piece of software that “catches fire” regularly. If you’re like most users, you have become accustomed to seeing these fires and dealing with them. But do you like them? Of course not.
What are the consequences? Word of mouth travels quickly, and these failures have created a large population of users who resent having to use devices and software that fail. Resentment leads users to search for a better alternative. This is good for competitors who offer a better, unfailing solution.

But the whole world of software (and digital devices that depend on software) suffers from a bad image because of these failures. From consumer devices to mission-critical industrial control systems, everyone who has to deal with modern digital devices is gun-shy about failures. And rightfully so.

Is your software on fire?

There are known methods to assure that the software-dependent devices you make will not catch fire. If you’re not certain what these methods are, or who can implement them for you, you need to find someone who can help. But before you call in a consultant, be sure that you’re willing to pay the price: It takes time and money to make software reliable, just as it does in batteries and laptops. Are you ready to buy in?