Posted
by
timothyon Friday May 16, 2014 @12:18PM
from the looking-deeper dept.

davecb (6526) writes "At Guido von Rossum's urging, Mike Bland has a look at detecting and fixing the "goto fail" bug at ACM Queue. He finds the same underlying problem in both in the Apple and Heartbleed bugs, and explains how to not suffer it again."
An excerpt: "WHY DIDN'T A TEST CATCH IT?
Several articles have attempted to explain why the Apple SSL vulnerability made it past whatever tests, tools, and processes Apple may have had in place, but these explanations are not sound, especially given the above demonstration to the contrary in working code. The ultimate responsibility for the failure to detect this vulnerability prior to release lies not with any individual programmer but with the culture in which the code was produced. Let's review a sample of the most prominent explanations and specify why they fall short.
Adam Langley's oft-quoted blog post13 discusses the exact technical ramifications of the bug but pulls back on asserting that automated testing would have caught it: "A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes.""

Dude.. chill, it is an actual apple, as in a fruit -- it is a saying. I didn't read the headline your way at all.

Actually, you are wrong. If you read the article, you'll see its main focus is on the "goto fail" bug and what the author perceives as the development shortcomings that allowed it to happen in the first place. The focus is pretty Apple-centric, mainly because he's using the "goto fail" bug as the primary evidence to support his central tenet. However I did not get the impression the author was anti-Apple.

Heartbleed is only mentioned as an afterthought because (as the article mentions) it became public knowl

It is a saying, but it is a very poor author that would use it in this context. When discussing a problem with a company called 'Apple', using a saying that also uses the word "apple" as an object containing a problem would only be used by the most incompetent of writers if they were not trying to make a play on words.

They are comparing the test methods that might have cought the Apple SSL "goto fail" bug vs the Heartbleed openssl bug (which was unchecked memory access). How do we know there isn't another SSL bug in either? That's right, we don't. And we won't until testing (automated or otherwise) gets better in both places. Ideally we would have a way to find (and fix) lots of worms in both.

And we won't until testing (automated or otherwise) gets better in both places.

I'm skeptical of testing (automated or otherwise), and I think point in TFS is well-taken: testing that would have caught this bug would have involved creating tests that virtually duplicated the system under test.

While some code is susceptible to test-driven development and thorough testing, and that should be done where-ever possible, the resources required to test some code effectively double the total effort required, and maintaining the tests becomes a huge headache. I've worked in heavily-tested environments and spent a significant fraction of my time "fixing" tests that weren't actually failing, but which due to changes in interfaces and design had become out-of-date or inappropriate.

That's not to say that testing can't be done better, but it's clearly a hard problem, and I've yet to see it done well for the kind of code I've worked on over the past 20 years (mostly algorithmic stuff, where the "right" answer is often only properly computable by the algorithm that is supposed to be under test, although there are constraints on correct solutions that can be applied.)

So I'm arguing that a culture of professionalism, that implements best-practices including coding standards and code reviews (possibly automated) that check for simple things like open if statements and unchecked memory access would be lower cost and at least as effective as heavier-weight testing.

This is a static-analysis vs dynamic-analysis argument, and while I certainly agree that dynamic analysis is necessary, both these bugs would have been caught with fairly simple-minded static analyzers checking against well-known coding standards from a decade ago.

It's no so much that -- if your coupling is loose enough, you should be able to test the API of any component in your system. But you have to come up with that test. Programmers often have blind spots for things where "no one would ever do that." It might be OK for programmers to come up with basic tests that exercise the API and make sure it's at least marginally functioning as designed, but you also really need to throw some guys at the code who just like to break things. Paid software houses don't even d

Another good practice is adding bounds checking when building in debug mode (for example in "assert" statements).When you develop your code, you use the bounds-checked version, and when in production, the checks are removed.

Also, you seem to forget that OpenSSL is full of legacy code, and legacy code is a not 2 times harder to unit-test but a hundred times !You need to cut the code into little pieces, but the goal of OpenSSL was to write the fastest code.

I seem to remember seeing an article on the NASA coding practice, and they do exactly what the summary suggests: every important feature is implemented twice, with two different algorithms, and they are tested against each other to ensure they produce the same result. They also do formal code reviews of every check-in (no matter how minor), and any bug found is treated as a process problem (i.e. how can we fix the process that allowed this bug in), rather than just a software problem.

PS: this is pretty obvious while unit-testing but I'll make it clear to avoid any confusion... the real implementation of SSLHashSHA1.update() and SSLHashSHA1.final() would not be called in this unit test, as that'd be outside of the scope of it.

Of course, it's obvious today that a test for behavior on inconsistent requests should have been done in OpenSSL. As well as a test for each failure cause should have been done by Apple. And next week, when an off-by-one bug bites us on an integer overflow in libfoobar, people will say testing for that condition should have been trivial.

So, yes, some conditions can be found with fuzzers. Of course, fuzzers work in an erratic way, and not all bugs can be triggered by them. But maybe fuzzing our code (more im

I don't know about the headline, but the other day I ran into a bug on OSX (some commandline tools developed problems with UTF-8). "No problem," I thought, "I'll just report it." I had to create an account to report a bug, which was annoying, but then when I got to the bug reporting website, I found this error message [picpaste.com]. "LOL" I thought, "but ok, I'll email them." I told them their bug website was having trouble, and they emailed back and said, "please report that through our bug reporting tool."

It's an attempt to get more views, I think. I know I clicked the link when I saw it in my RSS feed because I thought, "Holy crap, they found another glaring security whole in Apple products?" Then it's somebody analyzing others' analyses.

For the same reason new viruses will always defeat anti-virus software: Each virus is tested against existing anti-virus programs and only released into the wild when it has defeated all of them.

Not all of them. Malware writers go for the biggest targets for the least effort, and often just test against (and even have active intervention against) the best known/most used free AV programs (Microsoft, Avast, AVG, etc.), since a major volume of users use free solutions, and the major commercial (Symantec, Trend, Kaspersky, F-Secure, McAffee, etc.). It is actually a good bet that if you go with a lesser known commercial AV product you gain a significant protection advantage.

Sadly, it's a shame that people put much faith in AV programs given their effectiveness (http://arstechnica.com/security/2014/05/antivurus-pioneer-symantec-declares-av-dead-and-doomed-to-failure/). I think author R.R. Martin has it right (https://www.youtube.com/watch?v=X5REM-3nWHg), keep separate machine for different purposes - one for serious work and one for messing around with. It doesn't feel like a good idea to use one machine for everything.

Sadly, it's a shame that people put much faith in AV programs given their effectiveness (http://arstechnica.com/security/2014/05/antivurus-pioneer-symantec-declares-av-dead-and-doomed-to-failure/). I think author R.R. Martin has it right (https://www.youtube.com/watch?v=X5REM-3nWHg), keep separate machine for different purposes - one for serious work and one for messing around with. It doesn't feel like a good idea to use one machine for everything.

Symantec is mixing up stuff here to try to position themselves for the new hot profitable APT market. For one; the context of this quote about AV being dead was a WSJ interview with the CEO where he said it in the context if Symantec being able to increase their profit, as AV has become quite cheap and APT is getting all the nice profit margin - it was not said in a context of user need, but in a context of Symantec profit need.

Then they mix up some statistics about targeted advanced hacker attacks (APT),

Possible, but even assuming this, the main issue is that AV in general is considered a relevant safety measure when perhaps it should not be. The assumption by itself can lead to a false sense of security. Frankly, I'd rather run multiple VMs on a machine at the very least - MS Windows for games and Debian for serious work. I don't do serious work on a Windows machine or on any Apple device for that matter - I'd rather my OSs and apps be open source and subject to comminity scrutiny.

Actually, a static checker did find the OpenSSL bug, but nobody used Frama-C to check OpenSSL. Any parametric fuzzing would have caught the OpenSSL bug as well: give it construction of the packet and say, "Vary the data in this fixed length field, vary data and size of this variable-length field." Such tests only account for what types of data come through the program, and may cause strange behavior.

Test-driven development would also have caught Heartbleed. Similar to fuzzing, TDD would produce valid a

Did you actually read the article? The particular apple bug would have been easily caught by testing. In fact, the handshake code was copy-pasted six times in the code, and only one of the copies had the bug... if the developer had thought about about testing at all, that code would have been factored into one function, and even just by doing so the bug would have been less likely.

After reading TFA, I'm not sure I like the suggested approach to the "fix" in the code by replacing the two if blocks with a common method where you pass in all sorts of parameters.

Yes duplicate code is bad, I agree that's a "code smell" (one of the worst coding terms every to be invented BTW).

But just as odiferous to me, is a method with a billion arguments like the combined extracted method has. Sure duplicate if statements smell bad, but the replacement is worse and also harder to comprehend.

If you're selling that you coulda/woulda caught all X for X that haven't happened yet, you're selling snake oil. The reality is that this computer stuff is a little harder than it looks to do properly, and if all you have to offer is marketing bullshit and a History of Art degree, maybe you should leave it to the professionals, and push for budget to do things correctly rather than just do them.

If you're selling that you coulda/woulda caught all X for X that haven't happened yet, you're selling snake oil. The reality is that this computer stuff is a little harder than it looks to do properly, and if all you have to offer is marketing bullshit and a History of Art degree, maybe you should leave it to the professionals, and push for budget to do things correctly rather than just do them.

I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created.

One of the things that struck me about the goto fail bug was that it was specifically engineered out of coding best practices in the '90's.

Any reasonable coding standard from that time forbade if's without braces for precisely this reason. And yeah, that's a "no true Scotsman" kind of argument (if a coding standard didn't contain such a clause it was not by my definition "reasonable") but the point still holds: software developers at the time were aware of the risk of open if statements causing exactly this kind of failure, because we had observed them in the wild, and designed coding standards to reduce their occurrence.

So to be very specific about what kind of processes and culture would have prevented this bug: a reasonable coding standard and code reviews would have caught it (much of the code review process can be automated these days), and a culture of professionalism is required to implement and maintain such things.

The canonical attribute of professionals is that we worry at least as much about failure as success. We know that failures will happen, and work to reduce them to the bare minimum while still producing working systems under budget and on time (it follows from this that we also care about scheduling and estimation.)

Amateurs look at things like coding standards and reviews and say, "Well what are the odds of that happening! I'm so good it won't ever affect my code!"

Professionals say, "The history of my field shows that certain vulnerabilities are common, and I am human and fallible, so I will put in place simple, lightweight processes to avoid serious failures even when they have low probability, because in a world where millions of lines of code are written every day, a million-to-one bug is written by someone, somewhere with each turn of the Earth, and I'd rather that it wasn't written by me."

It's very difficult to convince amateurs of this, of course, so inculcating professional culture and values is vital.

Have you read the BSD style(9) man page? It specifically recommends omitting unnecessary braces. In an organization which follows this style guide, not only would the lack of braces not be flagged in a code review but if there were braces around a single-statement 'if' clause the reviewers might require that they be removed. Now, given that OSX is derived from BSD...

Use a space after keywords (if, while, for, return, switch). No braces are used for control statements with zero or only a single statement

Probably a lack of code reviews, or if they had code reviews not enough reviewers were being pedantic about style. I can actually understand this, because when I give reviews I do get pushback when pointing out violations of the local coding standards. I suspect a lot of people do code reviews only superficially and avoid any review of design/optimization/style specifically to avoid the stress of arguing about it.

As you have proven, standards are not enough.Modern languages implement this stuff in the tools and compiler and language spec. For example in Go, code is formatted automatically, showing the problem.

Dead code warnings would also have prevented this. The struct hashOut is given a value but not used. Tools can detect that sort of error. Even compilers can be built that error out on this. If you are willing to go to newer languages.

I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created. The above quote is of a similar sentiment.

Sounds great! Now just show me a program (more complex than "Hello World") with no bugs.

Yes, culture can play a large role in the frequency and severity of bugs released into production code. But humans make mistakes, simple as that. No amount of code reviews or test suites or BS like "pair programming" can ever get around that basic fact.

Yes, culture can play a large role in the frequency and severity of bugs released into production code. But humans make mistakes, simple as that. No amount of code reviews or test suites or BS like "pair programming" can ever get around that basic fact.

Uh, you have gone perpedicular to the problem here. Code reviews and test suites are things we have because of that basic fact.

I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created. The above quote is of a similar sentiment.
Sounds great! Now just show me a program (more complex than "Hello World") with no bugs.

Of course, it has been occasionally pointed out that the canonical "Hello World" program (from the "C Bible") actually has a bug. Granted, it's not one that you're ever likely to observe in the wild, and good luck writing malware to exploit it. But most programmers, even expert C programmers, can't spot it despite being trivially obvious when pointed out. This is actually a fairly nice example of how difficult it can be to write bug-free software, and I'd wonder if it was done intentionally in that book

Missing return from main, undefined behavior after the printf. What do I win?

More seriously, I think it's an overstatement that "even expert C programmers can't spot this". It jumped out at me right when I looked at it. I do a lot of C++, but not much C. I had to do some research to make sure it really was undefined behavior in C when I noticed it, but it did jump right out at me, and I wouldn't have let someone off for thi

i will go you one further and say that the more open style of braces would have shown the bug quite clearlyif ((err = SSLHashSHA1.update( &hashCtx, &signedParams)) != 0){ goto fail; goto fail; if ((err = SSLHashSHA1.final( &hashCtx, &hashOut)) != 0) { goto fail; }}// if you always put a brace on the line after the evaluation of an if, and tab in, the nesting will be obvious.

It's not clear to me how using brackets would have helped. The code would have failed even if there were brackets around one or the other or both of the offending statements. And it's not clear if the additional brackets would have increased the likelihood that the mistake would have been noticed.

Brackets around the entire IF would have caused it to work as expected, the second `goto fail;' would simply never get used. Brackets around only the first would have caused the second to look a lot more out of place, at least as much more as two identical lines on identical indents could. Where is the indent regime on this issue anyway? Isn't that even more of a style atrocity than foregoing brackets? Nothing like intentionally making your code look like assembly spit from a debugger to make it easy to

I've been in this field for 20+ years now, and I don't necessarily (in fact, I usually don't) agree with whatever the current trend is (which is probably why my karma is negative). One underlying trend, has been to make software something that can be made by anyone - to remove the requirement of having a special mind that is able to think through algorithms and code. This has generally been accomplished through process, and abstraction. Process - if we can describe a method well enough, then anyone should be able to follow it to it's logical conclusion. Abstraction - we keep adding layers upon layers in an effort to simplify and streamline that which is a complex thing (lots of numbers in sequence to control a microprocessor and it's accompanying hardware). You can probably tell that I'm not a great fan of either - though I'm really really trying to not be a negative type, and to go with the flow more. But I can't help my fundamental feelings that there is just no substitute for a smart individual with a gift of understanding the logic of code. I'm always against process because it takes the gift that i was given and neutralizes it. Personal feelings aside, I just don't think that all the process in the world is ever going to get ahead of the curve that is the battle between perfectly functional software and bugs.

If you make brilliant code that only you can understand, sorry to be harsh but you aren't that brilliant. We definitely need to value people who can generate and perfect algorithms, but do you think anyone would remember/value the Pythagorean Theorem if it was 40 steps long? No, he thought of a (then brilliant) way to do it simply and easily so that one only needs to understand basic math to pull it off. This is what we need more of; a single elegant algorithm that is so short it is hard to misuse is better than 1,000 algorithms that are all so hard to understand that only the author knows exactly how it works and will be forgotten as soon as the particular language or application fades into the past.

I kinda agree... I mean, I never go out of my way to artificially complicate my code. I'm not one of those people that uses macros just for the sake of showing how clever I can make them. The problem is that - to use your example - how many people do we have now that actually learn how to derive the Pythagorean Theorem? How do we build on that? The gifted people that COULD build on it, can't - because they're sandboxed into a process, or a higher level abstraction. My kids - they were in awe of the Sa

There's a false dichotomy here. He said that only *some* are qualified enough to create solutions to complex problems. You are saying his claim is that only *one* can understand, implying that the problem can't possibly be too hard, and that any hard code to follow is just because the developer is terrible at coding.

As a counter to your example of the Pythagorean Theorem, what about post-graduate math and science? There are tons of things which would make 40 steps seem easy by comparison. Should society

Turning on all warnings and forcing them to errors certainly would have caught the bug in Apple's SSL code. Anyone who just lets warnings fly by in C code is an idiot. Even if the warning is mildly silly, getting it out of the way lets the important warnings stand out. Sensible warnings from C compilers are the very reason we don't use lint anymore. Even then you still have to watch out, because some warnings won't appear at low optimization levels, and I recall hearing that there are a few obscure warnings not turned on by -Wall.

Also, it could have possibly been introduced by a bad merge. One of the things that putting braces on every if/for/while/etc. does is give merges more context to keep from fucking up, or at least a chance to cause brace mismatch.

As for Heartbleed, just the fact that the code wouldn't work with a compile time option to use the system malloc instead of a custom one should have been enough to raise some red flags. Because rolling your own code to do something "more efficiently" than the system libraries never introduces new problems, right?

Turning on all warnings and forcing them to errors certainly would have caught the bug in Apple's SSL code. Anyone who just lets warnings fly by in C code is an idiot. Even if the warning is mildly silly, getting it out of the way lets the important warnings stand out. Sensible warnings from C compilers are the very reason we don't use lint anymore. Even then you still have to watch out, because some warnings won't appear at low optimization levels, and I recall hearing that there are a few obscure warnings not turned on by -Wall.

Static analysis has not proven to be especially helpful in finding bugs in SQLite. Static analysis has found a few bugs in SQLite, but those are the exceptions. More bugs have been introduced into SQLite while trying to get it to compile without warnings than have been found by static analysis.

The bolded part has been my experience unfortunately. Static analysis is nearly useless.

An appropriate test for something like an SSL stack is a separate test harness that "fuzzes" the stack by exploring large random combinations of values, some with known good certificates and others with randomly generated (and thus broken) ones. These days one can spin up thousands of VMs, run

We have some lovely elements coming together right here on the slashdot blurb:

1. Stupid pun instead of a descriptive title
2. Full caps in the article excerpt
3. Trying to bring up coding "culture"
4. Assertion that it totally could have been caught beforehand, but they aren't sure exactly how.

This is clearly the automatic resolution of a merge conflict by the versioning control software. These are such a nightmare to debug and happen all the time. Developers rarely check their entire change visually post merge. Though this can be found using static analysis that force coding standards (such as forcing the use of brackets or proper indentation for the lexical scope). Though the bugs from automatic conflict resolution can only be really improved through better versioning software. These are without question the worst and most frustrating bugs.

The article doesn't give the revision history of how that code got there. Who put it there, when did they put it there, and what did the code look like before and after they put it there. We need the names!

It's remarkable how many organizations don't enable aggressive compiler warnings (or worse, ignore or disable them). One of the best practices I've learned is to turn on every warning that you possibly can and use the option that treats all warnings as compiler errors. The code from Apple may have been properly unit tested. However, if this was the result of a bad automated merge, unit tests are often not repeated on the resulting code base headed for system test. The GCC "-Wunreachable-code" option would h

The article contains the same flaw that people who rabidly declare unit tests as a panacea. The article basically shows that after discovery of a bug, a unit test can retroactively be constructed that would have caught the bug, therefore it's inexcusable that the bug got released, ignoring the fact that is hindsight. Unit tests are not without their utility certainly, but practically speaking you will not be able to construct unit tests that catches every single possible scenario. This is tricky enough f

The article contains the same flaw that people who rabidly declare unit tests as a panacea. The article basically shows that after discovery of a bug, a unit test can retroactively be constructed that would have caught the bug, therefore it's inexcusable that the bug got released, ignoring the fact that is hindsight.

The function was supposed to check various ways a key exchange message could be screwed up. The minimum set of unit tests appropriate for such a function are pretty clear -- feed it messages t

Yeah, the "culture" is "hurry up and get it done so you can get on to the next thing because if something takes more than an hour to do it's not worth doing" and it exists in every single software development organization on planet Earth. Until these things actually start costing real money to people with real power, this will continue.

Ok, writing "goto fail;" twice in a row is a bug. But it's not the real bug. This code was checking whether a connection was safe, and executed a "goto fail;" statement if one of the checks failed. It also executed one "goto fail;" by accident, skipping one of the checks. But one would think that a statement "goto fail;" would make the connecction fail! In that case, sure, there was a bug, but the bug should have led to all connections failing, which would have made it obvious to spot (because the code wouldn't have worked, ever).

So the real problem is a programming style where executing a statement "goto fail;" doesn't actually fail! If a function returns 0 for success / non-zero for failure like this one, it should have been obvious to add an "assert (err != 0)" to the failure case following the fail: label. And that would have _immediately_ caught the problem. There should be _one_ statement in the success case "return 0;" and _one_ statement in the failure case "assert (err != 0); return err;".