"Almost every paper on formal verification starts with the observation that software complexity is increasing, that this leads to errors, and that this is a problem for mission and safety critical software. We agree, as do most."

My first thought: No way. How can you eliminate every single error? Some of the more complex software packages must have at least a zillion lines of code.

The NICTA people are well aware of the complexity. That's why they're focusing on small embedded system software to start with.

Like cars

Every time my father calls with a computer problem, he points out that it's criminal how we accept buggy software: "It'll be a cold day in Hell before I'd buy a car like that."

Fast-forward to a recent phone call from my dad. I didn't know whether to laugh or cry. His car was recalled for a software update.

NICTA's quest

NICTA researchers mentioned:

"It does not have to be that way. Your company is commissioning a new vending software.... You write down in a contract precisely what the software is supposed to do.

And then--it does. Always. And the developers can prove it to you--with an actual mathematical machine-checked proof."

Sounds too good to be true, doesn't it? Still, academics do not make unsubstantiated claims. So, what do the NICTA team members have up their collective sleeves? I contacted NICTA, and Dr. June Andronick volunteered to explain what they had learned.

Kassner: How did you first get interested in code verification? And is it as daunting as it seems?
Andronick: I come from a math background. I started writing code with the mindset where you think about why your program should behave as you expect. For instance, "Why exactly should it terminate?"

I found formal code verification a fascinating way of combining the two worlds...writing precisely what you expect from the program...and then proving that it does.

Some of this is intuitive: Programmers usually have the gut feeling of why their program does what they want. Code verification just formalises this reasoning, and has it machine-checked, leaving no place for doubt.

It may sound daunting, but it is actually a lot of fun and addictive. It's like a game between you and the proof tool -- you trying to convince it why you think something is true.

Kassner: You used something called the seL4 microkernel to test your theories. Why did you select this kernel?
Andronick: The goal of the L4.verified project, led by Professor Gerwin Klein, was to formally verify seL4, a new microkernel designed by Dr. Kevin Elphinstone. It is the first step towards the long-term vision of our group, headed by Professor Gernot Heiser, to produce truly trustworthy software for which you can provide strong guarantee about its security, safety, and reliability.

The choice to tackle the kernel first is driven by one main reason: It is the most critical part of the system, residing between the hardware and the rest of the software. It has full access to all resources and controls what other software can access.

Any behavior of the system will rely on all aspects of kernel functionality. So any guarantee about the system will have to start with the kernel being functionally correct.

Because there is no protection against faults occurring within the kernel, any bug there can potentially cause arbitrary damage. The concept of microkernel comes from reducing the amount of such critical code to a minimum, reducing the "trusted computing base".

The result of the project was a formal proof of seL4's functionalcorrectness. This means that, under the assumptions of the proof, the kernel can never crash or otherwise behave unexpectedly.

This was the first formal proof of functional correctness of a general purpose operating system kernel, and more generally of any real-world application of significant size.

Kassner: 8700 lines of C and 600 lines of assembler seem like a lot of code to check. Is that what the following slide depicts? Could you please explain what we are looking at?

Andronick: This picture shows seL4's so-called function call graph. Each C function in the kernel is a dot. If function A calls function B, there is an arrow from A to B.

The graph shows that seL4 is highly complex software with strongly interconnected parts. This is typical for performance-optimized microkernels.

Well-engineered application-level software would have groups or islands of strongly related dots connected by a small number of arrows bridging between the islands.

Kassner: You use the terms formal verification and functional correctness. Could you describe what they each mean and what roles they play?
Andronick: Formal veriﬁcation refers to the application of mathematical proof techniques to establish properties about programs. It can cover not just all lines of code or all decisions in a program, but all possible behaviours for all possible inputs.

This exhaustive coverage of all possible scenarios is what differentiates it from testing, which can only find bugs, not prove the absence of bugs.

Functional correctness means that the kernel behaves as expected in its specification. This property is stronger and more precise than what automated techniques like model checking or static analysis can achieve.

We don't only analyse speciﬁc aspects of the kernel, such as safe execution, but also provide a full speciﬁcation and proof for the kernel's precise behaviour.

The approach we use is interactivetheoremproving. It requires human intervention and creativity to construct and guide the proof, but has the advantage that it is not constrained to speciﬁc properties or ﬁnite, feasible state spaces.

Kassner: I now understand what you are looking for. Could you give a brief overview of how you try to find problems?
Andronick: A problem is detected when the code does not behave as the specification prescribes.

The process starts by writing a formal specification of what the kernel is supposed to do. For instance, you can require that a sorting function returns a list that is sorted, with the same elements than the input list. The code will describe how this functionality is implemented, as choosing one sorting algorithm.

Then you need to prove that the result of the function will always satisfy the specification requirement. Again, the key differentiator to testing is that we reason about all possible inputs. If the specification does not hold for some inputs, the proof will reveal it. The bug will be fixed and the proof attempt will resume. When the proof is finished, you know that there are no implementation bugs left.

Kassner: What is the next step for the verification process? Can this procedure be used to test computer-operating systems we are used to?
Andronick: The exact same approach won't scale to systems comprising a million lines of code such as modern operating systems. But, we have a plan for large complex systems.

The first thing to note is that formal verification is not cheap. We spent signiﬁcant eﬀort on the veriﬁcation of about 10,000 lines of code. It still appears cost-eﬀective and more aﬀordable than other methods that achieve lower degrees of trustworthiness.

The main message is that such thorough methods really make sense in critical systems (medical, automotive, defense, etc).

Our approach for large critical systems comes from the observation that in such systems, not all of the code is critical. For example, medical devices have large user interfaces and airplanes have entertainment software (Prof. Heiser wrote a relevant blog illustrating the need for a verified kernel using in-flight entertainment systems as an example).

The key idea of our current work is to isolate the non-critical parts from the critical ones. This is done by using seL4 and its access control. This enables us to prove that a bug in the non-critical parts cannot prevent the critical parts from behaving correctly. This way we can concentrate the verification effort on the critical parts of the system.

In other words, we show that formal verification of functional correctness is practically achievable for OS microkernels, and we're now working on using this trustworthy foundation to give strong guarantees about large critical systems.

Kassner: Funny you should mention entertainment software for airlines. While on a recent red-eye to Amsterdam, the cabin attendants were running up and down the aisles, apologizing. The entertainment system was down. "It should be up soon." They said, "The captain is rebooting a computer."

Final thoughts

My burning question: "What else does that computer control?" I was assured that everything was under control.

About Michael Kassner

Information is my field...Writing is my passion...Coupling the two is my mission.

Full Bio

look at programming by contract, formal specification notation "Z" (or as they say in euro-land, "Zed"), and PSP. personal software productivity from CMU. these are three "simple" paths that will improve your software quality now.

As soon as we have defined a machine that is as powerful as a Turing machine to be complete enough and solve problems based on algorithms using those capabilities, the system becomes complex enough that the problem it is spposed to solve becomes provably correct. If it was, the system would be incomplete and limited to a few ranges of problems and solutions, or it won't find any solution in a finite time over a finite machine. It is even sometimes impossible to predict that the result will be incorrect. (there's a mathematical proof of this...)
One common example: it is most often impossible to determine if the program will terminate, if it contains loops with conditional breaks, because it is alsmost impossible to determine tha this condition will be satisfied, except in very few cases (such as counted loops where the loop counter is warrantied of strictly *never* being modified outside of the surrounding loop control : this is a frequent case in almost all softwares, but that does not cover many other cases, less frequent, but still present in almost all softwares).
So to solve this issue, one has to find a definition of what is considered a "correct" result, in order to reduce the search space in a model with lower complexity, hiding details that are unsolvable. In many softwares, the solution exposed in its implementation is not based on a proof, impossible to assert, but on an heuristic that produces an approximatively correct solution in a finite time (independantly of its optimisation). So instead of prooving that the result you get is correct, you can just proove that the software correctly implements the heuristic (but frequently without even knowing the level of accuracy, except for some subclasses of the problems this heuristic can solve, that are far below expectations).
So what can a program verifier do ? Basically, it will try to look for uncertainties, missing assertions in the problem description or in its environment. And the software could implement, in places where it is impossible to determine that it will terminate its computation, some "watchdog" that will limit the computations to some "reasonnable" counts of passes through the atomic check, in order to decide to abort the process prematurely in order to return "no solution in a reasonnable time, the problem is too generic to be currently solvable with enough accuracy without a better heuristic implementation".
The art of programming is not really in implementing algorithms, but really into designing the heuristics, and proving that it can solve a useful subset of the exposed problems: with a better heuristic, you may be able to solve more problems, but not the full set of problems initially expected: in other words, you simplify the problem, i.e. you modify it ! The good question is if a solution found with this heuristic is also a correct solution for the unmodified problem... And here again, this last question is not solvable in a reasonable time, for the generic case !
Of course, mathematics can help, but mathematics have already proven the existence of apparently "simple" problems that have unknown number of solutions... (We call these problems "conjectures", because it is also impossible to define them as axiomes without being sure that the augmented reasonment model is provably not self-contradicting ; as long as we won't have found a demonstration, the problem will remain unknown, but there are also mathematical proofs that it is both impossible to find a demonstration of the conjecture, and a proof that it is not self-contradictory so that the system augmented with the axiome becomes completely empty).
Some examples : look for "NP-complete" problems... (does this algorithm terminate and finds the solution or prooves that it has no solution in a polynomial time, instead of a ??? much worse ??? combinatorial time ?).
But we can sometimes proove that some problems are equivalent to some already wellknown NP-complete problems (about 18 of them have been found, possibly a few more now, and their formulation is desesperately simple compared to the heavy implications about their solvability).

Hi Michael-factoring in the human has always been a part of software functions. Code is built, designs are analysed and the software is used mainly by humans. So as I see it the main problem to software stability is the human. And until such things as security software and OS software can be built without human intervention there will always be bugs. Logical Computer Software ? able to think on its own and make the corrections. Possible ?

is the name of the book. I can't remember the author,it was 2008. It was about the costs of insecure software.
But my argument is that there is a difference between not meeting requirements, and bugs.
Bugs is things like buffer overflows, where stuff goes into places that it shouldn't and causes havoc.
Requirements - my best remembered one is the American software for accepting dates that would accept any two digits into the day/month/year __/__/__ fields. A proper requirement would have specified that 01-31 were valid values for day, depending on values chosen for month, etc. I always check a date prompt now to see if it will accept 99/99/99.
When I am teaching about test specifications, the law is that it must not only do what it is supposed to do, but must not do what it is not supposed to.
Regards,

Interesting discussion. Looks like 2 main issues: cost, and suitability for the "real world.
Re cost: There's a (back-of-the-envelope style) discussion in the seL4 paper (http://ertos.nicta.com.au/publications/papers/Klein_EHACDEEKNSTW_09.abstract). In a nutshell, the overall cost for a repeat exercise (i.e. doing something similar again, on top of what was learnt and using the tools etc developed, which is the fair comparison) is no more than twice that of traditional approaches (industry-standard QA for software that isn't highly critical). Compared to high-assurance regimes (for safety- or security-critical systems) it's actually *cheaper*. (And I don't accept at all that open-source is cheaper, the cost is just diluted and unaccounted.)
As far as suitability for the "real world" is concerned, Tony's argument seems to come down to "it doesn't solve all the world's problems (yet), so it's no good". Funny line or thinking.
Fact is that whatever critical system you build, if it's running on a faulty OS (and it must be assumed faulty unless proven otherwise) then whatever you're trying to do on top can go wrong, as it is at the mercy of that OS. What the seL4 verification has done is taking the kernel out of the set of faulty parts. (Actually not completely yet: there are still uncomfortable limitations in seL4, which the team is quite open about, and working on removing. These include the assumption of the compiler generating correct code, the initialization code is not yet verified and a few others.) And it's for that reason that seL4 *will* be used in critical real-world applications, in the not too distant future.
Re the "nothing new" comment: no, this isn't VDM. And yes, people have tried this in the 70's, including verifying operating systems. They failed. NICTA succeeded. That's what you call progress.
And progress has happened in other areas as well, e.g. WRT requirements capture.
Does seL4 solve all problems of software dependability? Of course not. But there is no way to solve these *without* something like seL4. It's a first step, and it achieves massively more than has ever been achieved before.
Gernot

After all, computer operations isn't exactly - ahem - exact, is it?
So I wonder about how one can predict an operation execution that results of a processing error?
Interesting stuff, as always, Michael.

1. The undecidability of most interesting properties of programs in Turing-complete languages is due to the fact that we cannot build a universal decider for said properties. The project discussed in the original post concerns a single program. This is crucial. We can (and did) prove a lot of interesting facts about individual programs without contradicting Goedel's incompleteness result.
2. Referring to NP-completeness is bound to mislead the uninitiated. Checking the correctness proofs is in P (ie the checker runs in time polynomial in the size of the proofs presented); creating the proofs is not known to be in NP (importantly: in the size of the claim to be proved).
Disclaimer: I'm involved in the seL4 verification project.

Go to the business types and argue for doing it right so you can save money later when right is going to become wrong. They won't be interested, not even a little bit.
Their promotion or bonus, or fiscal report or share price is based on getting something that can be sold as acceptable out by the end of next week, next month some other fool can deal with the fallout.
Usually the geek who was arguing against the short term give me mine now approach, if they don't get blamed and sacked for it.

Domino effects
House of cards
Object Oriented Programming
Rube Goldberg machines
Today's cars
Each of these are complex systems with interdependencies. They require that each part does its job well enough or the entire system fails. Perhaps we need to look at systems as massively parallel redundant systems rather than a series of calls, dependencies, and loops.

which have existed for decades have not took off to any noticeable extent in the general software industry.
Requirements analysis is the intractactable problem in our industry. Way back when programmers were so close to the machine there was room for all sorts of mismatch between design and implementation, that is nowhere near as much the case at the levels of abstraction we generally work at now.
It is VDM in that it's math, using math is the problem. Freed from bookkeeping is not going to be able to relate it to that email softare change request he sent last week.
Ever....
Oh and Fred's software and his need to chnage are critical...
Probably not life threatening, but still critical....

...is why not? Why are computer operations not exact? Do we have the right to expect a machine we build to work as it was designed and programmed?
If this is too much to ask, I am in the wrong business. I have built machines that exceed six-sigma uptime (regardless of how you figure it) on a routine basis. I expect it. Machines which do not meet this are repaired or replaced. This includes my router, servers, desktops, and yes, even my car.
I comes down to ownership. If I own the machine and the software on it, I can and will require this. If I don't own it and it doesn't meet my requirements, I send it back.
I do not need help making errors; I make plenty myself. I need my tools and machines to be ready when I am and to help me make my work better.

Could you tell me if it's possible to check a program with a negative list?
Say, describe ten kinds of malicious behaviors, then check whether a program does any of these? In other words, do you think it potentially could root out rogue apps?

The entire think missed me vertically by an astronomical unit.
I mean we are talking world wide blank look here....
And my boss wants to know what this enpy thing is about, he can't find any mention of it in the spec.
:D

Basically using mathematics to see the problems of a computer program is like trying to use an x-ray machine to see the solution of a crossword puzzle.
Or like trying to cut butter with an arc, I'd imagine.

would be if they could "lower the gain" to have their method find rogue code, say, in apps.
An automated and fast yet effective mapper to scan for certain kinds of malicious actions in the code of these very small programs would be very interesting to a lot of people with a lot of money.

That's the tipping over point where the producer of the thingy can start thinking profit, it's got naff all to to with correct and everything to do with acceptable.
We get buggy software and iffy machines because we accept them in return for a lower cost...

That's a fact of optimizing against diminishing returns.
Making a computer processor that makes no erroneous operations is likely going to be prohibitively expensive and/or slow.
The right way to handle it (if the instruction set to be handled isn't hardened against errors) is to have triplexed processing, taking advantage of the unlikelyhood of the the same random error happening in two out of three processors simultaneously. Of course, that's expensive too.
One way around it is to use a vast array of inexpensive processors; like someone once did with a busload of those faulty original pentium chips; put enough of them together in an array and you get the error margin down lower than that of any normal not-very-fault-prone CPU of the next higher generation, and with a power advantage too.

it often disproves the viability of often very promising venues of thought.
For example, a lot of linguists have been caught up in Chomsky's generative ideas about language, that the surface expressions (infinite in number) are generated on the basis of innate rules (finite in number)... but game theory supposedly has very rude things to say about that.
I don't know if the OP of this subthread is correct, but it seems to say that maths is seeing itself not applying to the case at hand. Which is funny, at least to me.
Regardless of whether it's true.

If someone writes a program that does something, like a game perhaps, but really it's a keylogging, screengrabbing piece of malware with trojan capacities... how would a scanner find that? It's not being unpatched, and it hasn't been tampered with... it was written to be malicious.
I doubt MS' scanner can read through everything the app does, and produce the intent behind it, but it sounds like this verification scanner could.

I see your point about planned obsolescence and agree, but the race to bigger, better, and faster takes care of it for us. For example, out router is not IPv6 capable, but version 2 will be when it comes out of test early next year. It will also have many other capabilities that no one has, at least not in one 15cm x 25cm x 6cm, 12W consuming, box that mounts on a wall and is administered by a web page.
When we release this gem, we have literally 15 other projects in the wings. We won't loose market because we don't narrow it to one. We raise the bar in one market and move on.

and probably hit and lock up the market, before you release.
Don't confuse technical execellence with business excellence, uinless excellence IS the product, you'll go bust, and quick.
Even worse situation in terms of mass production, imagine if some one had been dumb enough to mass market a product that would last a lifetime, in terms of mmeting the perceieved need, never mind actually. Planned obscolescence is a necessity for it to be a viable business.

I think I love you. :D
"What drives me nuts is when people with no technical knowlege are allowed to make decisions about a technical product to the exclusion of technical staff. "
Exactly.. that's what I've been saying in this thread. The business model needs to change. I told Tony before, if one company does it, others will (should) follow. Those who don't should go out of business, by way of customers not buying their shoddy products. ;)

an explosion in the job market. To get a similar effect it would have been pretty much the same rationale. Don't think it would affected me as I predate both techs, but without someone like Ms and intel, I'd still be fixing minor coding errors by poking an extra hole in a piece of paper tape. The tipping point was when techs became more expensive than computer time, eaily solved by create more techs., sometimes real ones...

I've worked for people who got it wrong and tanked, I've worked for people who got it right and are ridiculously succesful.
The thing is while we might have a passion for writing the best code we could they have have a passion for selling the best code they can afford us to write and make a nice juicy profit.
They aren't wong either, ready is a really fuzzy concept on occasion...
If Bill had waited for bug free windows, a lot of us would be seriously out of a job...

What drives me nuts is when people with no technical knowlege are allowed to make decisions about a technical product to the exclusion of technical staff.
Consider:
CEO/President/Big Cheese Education: MBA from Harvard Business
Accountant Education: Accounting and statistics from a community college
Sales and Marketing Education: Liberal arts/Underwater basket weaving
Software engineer Education: Master's degree in engineering from MIT
Who makes the deadlines, issues the specifications, promises things to the client, and decides the product is ready to ship? Yeah, these people actually did run a company, and I worked under the MIT grad. We quit for the reasons you described and the company tanked 3 months later.
Learning from this lesson, Everyone in my organisation is a tech. The 51 year old grandmother who runs the office is about to get her A+. Our priorities are clear, and everyone knows how to apply the brakes if something isn't right. We guarantee our product to have 99.99966% (six-sigma) scheduled uptime. and there has been no failures to date.
Bottom line: bug free secure software can be done. Make the main loop simple and flawless, then only commit modules when they are ready. Finally, ship when you have the feature set complete.

customers are preprared to accept. No just talking quality in terms of code base here, though that's a big bugbear of mine, but product quality. One you get in with a customer, particulalry with software, changing away from your stuff becomes a significant cost, and the alternative is probably just as bad but in a slightly different way.
I'm a pragmatically reined in perfectionist by nature, every bug I could fix, but I'm not allowed to drives me barking mad, every time I go near it, which is way too often for my comfort.

We don't want 6000$ laptops, even if their error ratios are 0%. Especially because it doesn't cost much to make the software resistant to the errors, and even more especially, because the error rate is already insignificant compared to the error rate of the software.
And with the software it's the same thing, often the error rate is insignificant, not causing the uptime to fall significantly.
And yes, sometimes the easiest way to find the bug is by having a million people try it out on a million different machines.
I mean, FOSS lives on that; the only projects without a major beta out seem to be dead projects.

It's always a trade off. What confidence level are you looking at? For an internet router your confidence level is that packets go where they're supposed to and is expressed in percentage.
95% you can buy a no name router from Taiwan for $10. 99% you buy a commercial grade Linksys at Walmart for $100. 99.99% you buy a Cisco or Jupiter for $1000 or more. 99.99966% you come talk to me, and the cost is about the same as a good used car.
My point is that there is a trade off, and it follows a very predictable curve. We want 100%, but we settle with what we can afford. I consider myself an exception to this rule, as do most people I am sure. The difference is that I will not subsidise shoddy work, nor will I pay more than I can afford. This attitude has raised the bar and changed the methods. The difference is in the implementation and utilisation rather than the brand on the case... the software makes a huge impact.

Are the solution (not sure if they are the kind used in NASA projects) but the Intel CISC chips wind up with too much overhead and errors, in terms of complexity in programming the stack. RISC chips' purpose is to minimize those errors in instruction execution (as far as I understand it, but I am not a programmer so :( ) so wouldn't that help as well, at the processor level?

CPUs making errors. The ratio of errors is known to the manufacturers, and they adapt, trying to minimize the effects of the errors by way of firmware and redundancies and whatnot.
But someone at Intel or AMD can tell you whether I've understood it wrong or not.
Now the human brain, that's a thing with an enormous tolerance for misfires. A CPU has less misfires, but less tolerance too, since it trusts itself more.

The infamous Pentium error was due to bad data in a look up table. Whether this is firmware, hardware, or software kind of depends on you point of view. I will submit that it wasn't software since it was hard-coded on the chip.
My reason for bringing this up is that hardware issues are just another type of error we can see in a computer. We do not tolerate these, but we accept software that ships with several thousand bugs, some of them crippling. I see this as a double standard.
We need to apply the same criteria to software issues that we do for hardware. The reason is that as computers become more integral in our lives, it is more and more likely that these digital helpers will take some part in a life or death decision of ours. I am far from flawless, but my tools make me more than I am.

Besides normal amount of hardware errors, equipment that is outside of our protective atmosphere is subjected to a cornucopia of nasty stuff, all of which can induce a charge ranging in intensity from insignificant to enough to blow chips off the board. There is a bias toward the low level events, with the probability curve looking like a slope.
While charged particles are a constant issue, it was a big surprise to find out that non-ionizing radiation striking ferrous materials produces an incredible amount of ionizing radiation which is lethal to computer and bio-critter alike. The trouble is that it it comes from a system board fastener, the resulting radiation source is very close to the area where it can do the most harm.

But the difference between good and great software is how the software handles the hardware error. Critical failure, soft failure, reset, error acceptance, and error correction are all options, but only one keeps everything running smoothly.

I didn't mean to say that people should use the flawed pentium, nor that they should accept such flaws, but to say that by lashing the flawed CPUs together some lab somewhere was able to make a very precise and very powerful machine.
It was a proof of concept which most likely is part of the basis for todays cloud adventures.
But my point was that all CPUs have a certain ratio of erroneous operations (as far as I have come to understand), probably far less than the 6-sigma requirement, but existing nonetheless. If you take into account that CPUs have numbers of operations numbering in the millions, this means that errors do occur, every hour, maybe every second.
In any CPU.

At least that's what I've understood.
As BBoyd points out, there's always radiation, even down here in the atmosphere. Cosmic and/or manmade.
Noise in the power feed perhaps.
Electromagnetic pollution from nearby wiring or equipment, certainly.
But I've also come to understand that errors occur by themselves too... electrons quantum-mechanically jumping across the boundaries in the processors perhaps. There are no certainties with subatomic particles, only stochastic tendencies approaching certainty (never getting there).
But I am sure there are CPU architecture specialists out there who can give an actual answer to this, rather than my patently inexpert musings :D

Landing the craft on the asteroid, without all the extra fuel in the plan, with a craft never designed for a landing.
Reprogramming to replace the sun location sensor with other inputs.
Makes me want to go off to the space coast or Huston.

Space is unforgiving. Between the ionizing radiation and temperature variance computers have it rough. ECC memory and parity schemes are a good start, but more can be done. A voting system and having non-identical parallel architecture works a treat... but at an extreme cost. You think hammers and toilet seats are expensive in space, you should price a man rated DPS system!

but would I knowingly implement one of those old Pentiums with the math error, thinking it's all right? Ummm... NO!
When these came out I had 2 CAD workstations with them. Considering some of the industrial control and aerospace projects we were running at the time we could not afford this kind of error. We donated them to a local community college and bought replacements.
Likewise in software and hardware as a system. I expect errors, but not in any measurable quantity.
The reason I quoted 6-sigma in the original thread (the statistical term not management book philosophy) is because the metric actually means something. I want my failures to be less than 6 times the standard deviation in a normal distribution curve which is where 100% confidence level begins in statistics.
Put into real world numbers, I expect to have 99.99966% uptime, where uptime is defined as being free of unscheduled downtime and does not count routine booting or shutting down times. Putting this into perspective, this is a little over 10 seconds of allowable downtime per year, and we did it with our router appliances.

Fault tolerance is critical in an environment where a stray proton can set bit errors.
Just like buying ECC memory may make sense your OS can provide processor fault tolerance at an expense of overhead. If you design OS's for spacecraft you would probably require redundant fault tolerance at every level.
I'm sure the team of programmers for the NEAR Shoemaker spacecraft wished that they had formally verified the OS after it lost its mind mid flight.
http://klabs.org/richcontent/Reports/Failure_Reports/NEAR_Rendezvous_Burn.pdf
Thirteen months late and a lot of programming hours later the cost may have been justified