Testing, test management and generic techy stuff

Main menu

Post navigation

Risk based testing 2: Donald Rumsfeld and the Martian that didn’t make it

Donald Rumsfeld has been widely ridiculed for his statement about unknown unknowns. Unfairly, if you ask me. When you move into testing, unknown unknowns are your biggest concern.

In my previous post I wrote about asking the right questions. I’d like to illustrate why that is so important, with a story about Martians.

In 1998, NASA sent out a probe called the Mars Climate Orbiter. It was going to orbit Mars and collect data on the red planet’s climate, preparing for Matt Damon’s future expedition. Unfortunately, unlike Matt Damon, the probe never made it. It crashed into the atmosphere and disintegrated.

What happened?

Now, NASA are pretty meticulous about their testing. When NASA make a mistake of this magnitude, we’re talking about billions of wasted dollars. If there’s anything they know they don’t know, they will find out before sending their toy into space.

Of course, there are always some risks you have to mitigate in other ways than testing. If you know about something that could go wrong, and you know you cannot physically or practically test the exact conditions, you can take action to reduce the risks. If you’re at sea in a small rowing boat, you don’t know whether you’ll be hit by a wave and be thrown in the water. Waves are unpredictable. That’s why you wear a life vest. In the same way, NASA’s spacecrafts have plenty of systems designed to handle events with uncertain outcomes.

Sometimes these systems fail too. There are calculated risks. At some point we have to draw the line and say that the cost of further mitigation is so high that we will accept the remaining risk. In the rowing boat, the life vest is good enough – you don’t put on a survival suit and sling a bag of emergency flares over your shoulder before you go out to pull in the fishing net.

The tricky part, though, is what you don’t know that you don’t know. That’s what happened to the Mars Climate Orbiter. It wasn’t a deliberately calculated risk.

One of NASA’s subcontractors delivered a subsystem which used imperial units. NASA, like the rest of the scientific world, uses the metric SI units. And so it happened that a series of events occurred during the probe’s flight, leading to a a situation during descent to orbit where the measured values from one subsystem and the calculated values from another gave conflicting data. After its nine month journey from Earth to Mars, the probe, as Wikipedia puts it, “unintentionally deorbited”.

You can be pretty sure that no engineer in NASA at any point during the project said in a risk evaluation meeting, “Well, we don’t know for sure whether all the subsystems use the correct units of measurement”. And it is equally certain that no manager answered, “I understand, but we can’t be bothered to test that before we launch. It will take too much time and we can’t spare the resources”. To NASA, the subsystem’s use of pound-seconds was an unknown unknown. It was something they didn’t know that they didn’t know.

You need a combination of creativity, method and experience to find those unknown unknowns. Even then, you’re never going to find them all, but as a skilled tester you should never stop hunting. Even if your unknowns cost significantly less than a space probe.