Friday, November 28, 2008

Barkburn

Many years ago, I attended an excellent course on project leadership. They had a term called "barkburn." You know the people who can't see the forest for the trees? People with barkburn can't see the tree because their face is buried in the bark.

The other night, after a .NET SIG meeting, we got in a discussion of testing, and particularly the relationship between unit testing (including Test-Driven Design) versus the kind of large-scale architectural and design issues that tend to interest me. And of course, quality issues always bring up the space shuttle, where many software engineering and quality practices originated. In email after the discussion, someone referenced Unit Testing, TDD and the Shuttle Disaster.

A couple of thoughts on the shuttle….

Unit testing is certainly not a new idea. TDD is just a another way of doing it. And it is certainly true that hardware gets unit and component testing as well as integration testing and component testing. Nothing at all new about this. And test specs getting written before development isn’t new, either in software or hardware. Some hardware units in any design and early production are designated specifically for testing, often testing to destruction. There’s rarely anything new under the sun.

NASA’s history prior to the shuttle included the Saturn V rocket, which was developed on a very accelerated schedule. Part of that schedule acceleration included grouping of component and integration tests with system test – first test of some components and integration consisted of flying the first rocket. They called it “all up” testing. And it worked reasonably well; the first manned Saturn V flight was only the third flight of the rocket, and there were no failures in the entire life of the Saturn V. There was only one serious problem that occurred in any flights, a “pogo” vibration in the second stage that occurred in Apollo 6 (unmanned) and Apollo 13. Apollo 12’s Saturn V survived a lightning strike during liftoff.

Compare that to the shuttle. Both Challenger and Columbia failures shared a common root cause: stacking the shuttle side-by-side with its boosters and fuel tank, instead of the usual vertical stack. The Challenger break-up occurred because of asymmetric aerodynamic forces, due to trying to fly sideways as the boosters and fuel tank came apart. While an o-ring failure on a vertical stack would also have probably lost the launch stage, an Apollo-type design would have left the crew vehicle on-axis with the booster thrust changes, and the Apollo (as well as Mercury and Gemini) had an escape rocket to pull the crew vehicle away from a failing booster. The o-ring failure might well have been survivable in a vertical stack.

As for Columbia…. Ice shedding during launch is utterly routine. Ice builds up on the storage tanks for any rocket fueled by liquid oxygen. The foam was there only to protect the shuttle in a side-by-side stack. In a vertical stack, this wouldn’t have been a failure mode at all; it would have been just another routine launch.

These are design failures, and you can’t unit-test your way out of design flaws. As the Quality Assurance professionals have known for a long time, you can’t test quality into a product. That’s something the software industry tends to forget, on a fairly regular schedule.