How correct is correct enough?

First of all, welcome back to programming! It was a bit of a shock for me to see that, since the last programming article, I’ve written a sequence of four posts about Doctor Who and Buffy, which is an imbalance I never intended. Well, Doctor Who will end after two more episodes, so hang in there, programmers!

Now, then.

The response to We can’t afford to write safe software was very interesting. As usual (I am pleased and proud to say) the comments here on The Reinvigorated Programmer article itself were, almost without exception, insightful and informative. But over on Reddit things were not so good.

I’ve found this kind of absolutism much more common on Reddit than here, with Hacker News somewhere in between. It’s a bit disturbing given that only a Sith thinks in absolutes (to which Anakin should have replied: “Are you absolutely sure?”). Sometimes in my more cynical moments, I wonder what proportion of Reddit comments are written by people who have actually read the article.

Anyway, this comment did have the merit that it set me thinking. How much safety do we in fact need? And, more importantly, what do we have to sacrifice in order to achieve it? And can we afford those sacrifices?

Of course it’s nice to have zero bugs. But we all know that in reality, (A) it’s not going to happen, and (B) it’s not truly our main concern. I quote from Jon Bentley’s book More Programming Pearls: Confessions of a Coder [amazon.com, amazon.co.uk], page 67:

I once stated to Bill Wulf of Tartan Laboratories that “if a program doesn’t work, it doesn’t matter how fast it runs” as an undebatable fact. He raised the example of a document formatter that we both used. Although the program was significantly faster than its predecessor, it could sometimes seem excruciatingly slow: it took several hours to compile a book. Wulf won our verbal battle with this argument: “Like all large systems, that program today has ten documented, but minor, bugs. Next month, it will have ten different small, known bugs. If you could magically either remove the ten current bugs or speed up the program by a factor of ten, which would you pick?”

In these enlightened days, of course, typesetting speed is not really an issue: 95% of my non-plain-text writing is done either on OpenOffice or the WordPress editor, both of which essentially do typesetting in real time. But the principle is still good: for “document formatter”, read “web browser”; for “several hours to compile a book”, substitute “several seconds to display a page”, and for “ten bugs” read “14303 bugs“. In practice, speed is often more important than correctness.

And of course we don’t really need Jon Bentley’s anecdotes to prove this: all of us who are programmers make the correctness-vs-speed-vs-functionality call constantly: at any given moment, I can work on eliminating bugs from my code, or improving its speed, or adding new functionality. If StoneCypher is right then we should always, as a no-brainer, pick the first of these, only working on performance and new functionality when all known bugs have been eliminated; but we all know that in practice we spend more time on adding new functionality than on bug-fixing.

Needless to say, the trade-off point is impossible to determine algorithmically, and must be determined by taste, experience, judgement and often by commercial pressures. The appropriate choice of when to work on correctness and when to work on performance or functionality is also different depending on the application domain. It hardly needs saying that avionics systems need to be correct, always: they merit the use of formal methods in writing and proving the code, as well as batteries of tests to improve confidence in its correctness. Partly that’s because the consequences of bugs are so severe; it may also be because the functional requirements in that domain are well demarcated and understood, so there is relatively little pressure to invest time in adding new functionality. By contrast, there seems to be a silent agreement in the world of web browsers that bugs are OK, really — even dramatic ones that crash the browser — and that what we all really want is more new features.

(I remember back when Netscape was still standard, a colleague trying to sell me on the then new pre-1.0 Mozilla by telling me that its crash-recovery was excellent. It didn’t strike me as a good omen when a program’s best feature is its crash-recovery but, what do you know: here we are in 2010, and crash recovery is still important in browsers. Recently, Google Chrome went through a series of updates that left it crashing on me maybe once a day for a fortnight or so; I stuck with it anyway, rather than reverting to the much more stable Firefox, simply because it’s twice as fast.)

So, OK — we all seem to more or less agree that avionics software needs to be correct; but web browsers need to be fast and featureful, and if bugginess is the price we have to pay, then so be it. But what about software in the middle? What kinds of programs fall on which side of the line?

You’d have thought that operating systems would fall firmly on the must-be-correct side of the line; but the generation that’s grown up with Windows versions that need rebooting several times a day, and which considers an O/S reinstall to be a fairly routine procedure, seems to have been taught that it ain’t so. Maybe I should have said “brainwashed” instead of “taught”. To me, that state of affairs a travesty, but I guess the world voted with its pocket.

Anyway, I hope I’ve gone some way towards convincing you that “all bugs are unacceptable” is unrealistic fundamentalism. Next time, we’ll look at some back-of-the-envelope calculations that can help us to make sensible decisions on where to invest time, and which bugs we can and should ignore.

15 responses to “How correct is correct enough?”

A better title could have been “We can’t afford to write perfect software”. Also, quality is usually associated to correctness, but I prefer the viewpoint of modern issue management practice where all kinds of things – bugs, RFEs, optimizations – are just different kinds of “issue”. Then if you don’t have time or resources to close 100% of these issues as Resolved Fixed, it’s just normal project/product management to assign priorities and severities, and draw the line for next release.

The “formal methods” that you mention in that blog is not even something remarkably difficult or formal; real Formal Methods are stuff like Z or Petri Nets, these were in high hype when I was doing my CS – I’ve had to suffer some Z classes with a professors that was himself researching the subject and very convinced that such methods were the future of software engineering. Guess what, these are basically abandoned today – even in critical areas like avionics or pacemakers. The whole idea – mathematical methods allowing you to translate a program to a theorem and prove it to be correct – was obviously flawed, first because it makes software construction 10-100X more difficult and expensive. Second because it doesn’t scale: at ~1990 I think it was still viable in some niches, but today, any realistic piece of software is big enough that its full formal specification would be so huge that it’s impossible to avoid mistakes in the formal spec itself! [Only extremely miche tasks, like academic research of typesystems, still resorts to such methods.]

On the other hand, I don’t like very much the current trend of TDD, as in, forget any up-front design or formalism [or even languages enforcing compile-time error detection… but let’s not return to that debate] and have a huge test suite as your single line of defense. It’s an extremist position (in the opposite corner of heavyweight Formal Methods), except that writing unit test code is much easier, then people get all warm and fuzzy when their 100KLOC system has 1MLOC of tests and 90% of code coverage.

Perhaps we should follow Buddha and Aristotle’s advice: The Virtue is in the Middle?

“First of all, welcome back to programming! It was a bit of a shock for me to see that, since the last programming article, I’ve written a sequence of four posts about Doctor Who and Buffy, which is an imbalance I never intended. Well, Doctor Who will end after two more episodes, so hang in there, programmers!”

That said, is there any chance of a programming-only RSS feed? (Asking you to do separate blogs is a bit much, and complaining that the sushi photos mean your blog always makes me hungry is just pedantic.)

I love your software musings; don’t really care about the good Doctor and Buffy. (Yes, yes, I know my geek cred just took a serious blow.)

It can be frustrating to be on the other side. There’s a compiler I use which is really, really bad when it comes to reliability.

With small programs, it works fine. But as a project gets larger and larger, you reach this invisible limit where you cannot add more functions or the linker dies. If you feed it modules in the wrong order, it dies. If you use certain language features under an impossibly complex set of conditions, they cause the compiler to emit invalid object files which cause the linker to silently produce no output at all.

In fact, the linker in of itself is a huge reason to not use the language at all. The author defends it on the basis that “it’s the fastest linker there is.” It’s also deeply unreliable.

I’d gladly take a slower linker and compiler in exchange for confidence that I’m not going to hit a glass ceiling three years into a major project. I’m frankly terrified of it.

Regarding reddit comments… well, I’m start thinking that 90% of reddit’s userbase actually doesn’t write software for living, but just for enjoyin’ their free time or for sake of knowledge.
Nice way of spending time, using every formalism and methodology we’ve learnt in our CS classes, but radically not fitting with “real” production programming

Let’s face it, knowing all the “good” and “safe” methodologies of writing software for nowadays projects is just not needed anymore. Today’s business need is fast response (in term of features and release dates), if something doesn’t work, well, we’ll make a service pack later. And bugs stack up. Complexity grows. Cost reduction is often paid with reducing test time and forcing the dev team to use some obscure “magical” framework which already does the tricks (and consequently spend hours/days trying to integrate the only useful feature of the framework who was left out o fit).

Being a programmer today is freaking difficult. I remember some projects from the end of the 90’s… those were still great times for being a programmer. We had time, we had solid tools to work with (VI and a good ANSI-C compiler were enough to really “do” something). And quality/performance was really high (not using too much third parts stuffs used to keep the project fast.

Now taking Java as an example, just for using a single class, often you need to use (and distribute) tons of libs. If you’re doomed to J2EE you have to face all kind of interop issues with container’s own sets of libs and malfunctions due to incompatibility between version is becoming a real issue/pain.

The problem with the pragmatic approach to correctness is security (security is the hard part of making correct software, while correctness is the hard part of making software.) We can tolerate the occasonal browser crash when we are just browsing, but browsers (together with consumer-grade operating systems and a network that was not originally intended to be open) are now a component in the global consumer banking network.

Perhaps it was a mistake to make it so, but my recollection of how it happened is that it was mostly by one reasonable and pragmatic decision at a time, salted with the occasional reckless/thoughtless doozy. It will probably take the banking equivalent of a blowout-preventer failure for the banks, let alone majority of the populace, to give it much serious (as in ‘we need to do something about this’)thought.

I am not saying the pragmatic approach is wrong (I do not see any alternative), I am saying the tradeoffs are not obvious, and are highly context-sensitive, so your original assumptions can be invalidated by subsequent events.

@ARaybould: Open or closed, banks always coped with fallible software. Banks approach security and stability like they usually do with anything: as accountable, insurable risk. In my country (and in most countries), banks are held accountable for most kinds of computer fraud; if I’m using an online-bank site and my password gets stolen by a keylogger and a hacker uses the password to withdraw all my money, the bank will have to refund me even if it was my fault to have that virus in my machine.

But the banks do some maths: the savings of having an online banking services, and the average $$$ they waste every year refunding clients who got stolen by hackers (plus other issues like bad PR, customer education, continuous investment in security, etc.). They see that the benefit beats the cost by far – just think of the thousands of employees and brick-and-mortar branches they avoid, when millions of transactions are processed online. So, it’s just a normal business decision to accept the security risks and costs. And this is not even new in the online business. “Real-world” bank branches are robbed; even armored trucks transporting cash are robbed; not to mention internal fraud committed by bank employees. Banks wouldn’t exist at all if they insisted on 100% safe operations. Software doesn’t change ANYTHING in this reality, it’s just another kind of risk to manage.

[Of course, software developed for banks is usually higher-than-average in the scale of security; banks will happily pay for better coding and testing of security-critical systems. But even on internal development that they can control, there is no perfect-software-only fundamentalism, remarkably due to integration and time-to-market factors.]

Great point, ARaybould: security does change the game here, and never more so than in the case the Web, which was envisaged as a read-only system for reading text documents, and has mutated into, well, our world, really. I guess I am not alone in using GMail for my email, which is critical for for me both in my work and the rest of my life; needless to say I write a lot using web-based tools (i.e. WordPress); I buy lots of stuff on line — mostly from Amazon; I use PayPal, which amounts to an online bank. All of this is far, far, far outside the design parameters of the World Wide Web as originally conceived, and it definitely raises the game as to how reliable browsers need to be. Or at least it should raise the bar — but there’s not as much evidence as I’d like to show that it actually has.

… and thanks to Osvaldo Pinali Doederlein for another important insight: that security failures in banks are not necessarily catastrophic and may just be part of life, as paying out on a house fire is part of life for an insurance company.

@Osvaldo Pinali Doederlei: You have a point, but we should consider whether the risk assessments are accurate, and merely extrapolating the past is not a reliable guide to the future. The recent history of finance is littered with the consequences of failed risk-benefit analysis. As to whether banks, rather than customers, will always bear the costs of fraud committed through security breaches, check the comments on ‘liability shift’ here: chip-and-PIN is broken

First, avionics is not such a stagnant field. It can be made as stagnant as you like by eliminating good ideas, but the same is true in any field. Good data visualization, for example, is more immediately valuable to pilots than to scientists.

Second, the formal methods, besides being extremely expensive to apply, don’t provide near as much benefit as grant applications would have you believe. Bugs are more likely to be in the algorithms, and out of reach of the “formal methods”, than in the coding.

Avionics software quality comes mostly from good old fashioned attention-to-detail engineering. That comes from good old fashioned engineers doing the coding, rather than namby-pamby computer science graduates. The same goes for CPU designs, which are built to even higher standards, despite the most stringent performance demands of all.

We find ourselves coding around library bugs all the damn time. We run into a compiler code-generation bug once in a while (though it’s been years, for me). How long has it been since you even heard of an Intel or AMD instruction that did the wrong thing, never mind had a program break because of one? Remember that FDIV was in 1994. Some people coding today weren’t even born then.

@Nathan: Even if self-serving, I agree that putting real engineers to code is the best thing a manager can do for quality. This, plus realistic schedules, well-managed requisites, and other items that should be common sense.

On the CPU bugs though, you’re a bit off: CPU bugs are much more common than most people realize; there is no single CPU that ships without any bugs. CPUs even have “service packs” (aka Steppings). For critical systems, you are advised to not buy the initial batches of any next-gen CPU. Recently in 2007 there was another highly visible issue, the TLB bug from AMD’s Phenom. But less-severe bugs are much more common. For a recent sample, see for example http://download.intel.com/design/processor/specupdt/320836.pdf – go to the “Errata” section, which is over 45 *pages* long; I didn’t count but there’s easily more than 100 known bugs in this particular chip.

Since people are using avionics as an example, it might make sense to look at airplanes. To start with, they are far from perfect, but they are designed to fly even when essentially broken. That’s why you can glide to a landing, assuming you can find some place to land, without power. That’s why pilots are trained in situational alertness and classical navigation techniques. That’s why flights can be dispatched despite the aircraft having deferred maintenance items, which is aviation speak for broken parts.

Failures are seen as the result of a chain of events. Just as death certificates, at least in the US, tend to list three levels of cause of death, aircraft failures rarely have one single point of cause. Read a flying magazine or blog to get a feeling for the way aviation people think about risk. The goal is to allow failures, but to control their propagation.

That’s been the trend in software as well. Having been used to old fashioned time sharing systems that firewalled user processes, I found moving to PCs a big step backwards with the entire machine crashing whenever one program failed. Now there are firewalls within firewalls with the system layered from the kernel out. That’s a big trend in modern browsers, allowing one page to crash without taking down the entire browser.

I’m not saying we shouldn’t be trying to write better code, but we cannot expect perfection. Our systems need to work in the real world.

—-

Actually, a lot of the package problem flows from our desire to improve our code. Sure, I can write a trash hack that does the 5% of DBM_File pretty quickly, but it will surely break when stressed. The alternative is to use someone else’s more reliable code that does a lot more than I might want, but has some level of robustness. Before I know it I am spending 50% of my time reading library specifications, testing them and swearing.

@Nathan, I beg to differ with your comments on formal methods. FMs are much less expensive to apply than they were in the 1990s, thanks to advances in automated reasoning. Also, algorithms are by no means out of reach of FM’s – if you can specify what change of state the algorithm is meant to achieve, then you can use FMs to prove that your algorithm achieves that change. As you point out, CPU bugs are thankfully rare now – one of the reasons for this is that FMs are used extensively in CPU design. Windows device driver crashes are also rare now – in part because any WHQL-certified device driver has been passed by the Microsoft Driver Verifier, which is built around FMs.