Sunday, November 28, 2004

The phenomenal reliability of the systems we trust for banking, communication, and everything else rests on two bedrock principles. One is the universal understanding in the technology world that nothing works right the first time, and maybe not the first 50 times.

When I worked briefly on a product design team at Microsoft, I was sobered to learn that fully one-fourth of the company's typical two-year "product cycle time" was devoted to testing. Programmers spend 18 months designing and debugging a system. Then testers spend the next six months finding the problems they missed. It is no secret that even then, the "final" software from Microsoft, or any other company, is far from perfect.

Today's mature systems work as well as they do only because they are exposed to nonstop, high-stakes torture testing. EBay lists nearly four million new items each day. If a problem affects even a tiny fraction of its users, eBay will be swamped with reports immediately.

Millions of data packets are being routed across the Internet every second. If servers, domain-name directories or other components cannot handle the volume, the problem will become apparent quickly. Years ago, bank or airline computers would often be "down" because of unforeseen problems. Now they're mostly "up," because they've had so long for flaws to become exposed.

The second crucial element in making reliable systems is accountability. Users can trust today's systems precisely because they don't have to take them on trust. Some important computer systems run on open-source software, like Linux, in which the code itself can be examined by outsiders.

Virtually all systems provide some sort of confirmation of transactions. You have the slip from the A.T.M., the receipt for your credit card charge, the printout of your e-ticket reservation. If your e-mail message doesn't go through, there is still the copy in your "Sent" folder. This is the technology world's counterpart to the check-and-balance principle in the United States government. The first concept, robust testing, protects against unintended flaws. The second, accountability, guards against purposeful distortions.