If we assume that each launch has the same probability of success, then these are simple risk calculations to make, e.g. see these slides. The posterior probability of success, \(\theta\), is
\[ p(\theta | r, n) = \mathrm{Beta}(\alpha + r, \beta + n - r) \]
where \(r\) is the number of successes, \(n\) is the number of trials, and \(\alpha\) and \(\beta\) are parameters of the Beta distribution prior. What values of parameters should we choose for the prior? I like \(\alpha=\beta=1\), you could probably make a case for anything consistent with \(\alpha+\beta-2=0\). Many people say that risk = probability * consequence. I don't know what the consequences are in this case, and under that approach NASA's chart doesn't make any sense (you could have a low risk with a high probability of failing to launch an inconsequential payload), so I'll stick to just the probabilities of launch success, and leave worrying about the consequences to others.

Since NASA specifies a number of successes in a row (consecutive) then there is already an indication that assuming the trials independent and identically distributed (i.i.d.) is unrealistic. If your expensive rocket blows up, you usually do your best to find out why and fix the cause of failure. That way on the next launch your rocket has a higher probability of success than it previously did.
For the sake of easy math, lets assume that you are a perfect rocket launcher, so you've had no failures, and thus no reason to go and change (fix) your rocket. Then the assumption of constant failure rate across launches may not be all that bad. So naturally you want to know if you should just launch a bunch of rockets to get your low risk rating from NASA, or launch a few, and let NASA review your design, test data, processes and procedures. Assuming no failures, there are three options for getting a low risk rating, and more importantly the access to provide launch services for the big, expensive payloads that comes with that low rating:

Launch 14 successfully

Launch 6 successfully, and allow NASA audits of systems engineering in manufacturing and operations, and pass a NASA design certification review, and allow a NASA audit of quality assurance processes, and pass a NASA design certification review of avionics qualification, and pass a NASA design certification review of the launch complex

Launch 3 successfully, and allow NASA audits of systems engineering in manufacturing and operations, and provide NASA comprehensive acceptance test results, and allow a NASA audit of quality assurance processes, and pass a series of NASA engineering review boards of avionics subsystems, and pass a NASA engineering review board of the launch complex

If you opt to only do the "design certification reviews" (option 2), rather than the more detailed "engineering review boards", then 3 launches gets you a "medium" rather than "low" risk rating. We can back out how much risk NASA believes it is buying with their extra effort in more detailed reviews and audits. If you're not picky about units we can give the answer straight away: The smallest amount of review is worth 8 Successful Launches, and the extra level of review is worth an additional 3 Successful Launches.

We could take a look at what our uncertainty in the success rate is after different numbers of launches.

Unfortunately, decision makers don't usually care about our uncertainty in a parameter. Maybe a more interesting question to answer is how 8 or 3 or any number of successes should change our estimation of the expected probability of success on the next flight. "Probability of success" is something that decision makers often think they want to know. That's easy enough for our simple problem, we just calculate the expected value of \(\theta\) for all those different posterior distributions.

The standard calls out 14 flights, or a 0.95 probability of success at a 0.5 confidence level to achieve "low risk" with no audits or reviews. With a one-sided interval that is actually achieved at 13 successes, but I guess NASA doesn't like "lucky" 13, so they bumped it up to 14. Going from 3 successes to 6 successes gives an increase in probability of success from 0.8 to 0.875 (0.075), and from 6 to 14 gives 0.875 to 0.9375 (0.0625). So, NASA thinks that its most stringent level of review increases the probability of a launch services provider's success by about 14%.

According to this article the engineers on Apollo had a "three nines" standard for components. With that kind of standard it takes about 50 components before your overall system success rate is around 0.95, and about 100 until you only have "one nine" on the whole system. I wonder how many nines the ULA folks had to demonstrate for their recent reviews?

1 comment:

This sounds similar to the process outlined in the NASA document (launch a few times and let us audit you): SpaceX officials want the company to get its military certification in 2014. It will need to launch each version of a rocket successfully three times before it can receive the Defense Department’s approval, Shotwell said.

Air Force Major Eric Badger, a Pentagon spokesman, said the service is “making great progress with SpaceX.’’