Vegas, baby! Iron Chef Black Hat

Back when we lived in San Francisco in the 1990s, we were huge fans of Fuji TV’s Iron Chef, then shown with subtitles on a local cable station. When local chef Ron Siegel repeated his winning “lobster confront” menu at Charles Nob Hill, word got leaked to the Iron Chef mailing list and we managed to get seats … wow! And I’ll never forget the time that Bobby Flay in his exuberance jumped on the sushi board; so of course when I was at Caesar’s I had to have lunch at his Mesa Grill.

Iron Chef is also a good lens to looking at Black Hat from the perspective of the consulting I’m doing for San Francisco-based startup Coverity. This gives a completely different picture of the conference than the political and front-page-news of Vegas Baby! Black Hat, glitter, and pwnies. It’s just as interesting though, thanks in no small part to Fortify’s Iron Chef Black Hat.

For those not familiar with the esoteric space of static analysis tools, Coverity and Fortify are two of the major players. Coverity is strong in reliabiilty and the embedded segment; Fortify is strong in security and in the financial segment. There are other important players as well, including Ounce, Armorize, Veracode, Klocwork and Microsoft (whose PREfix and PREfast, both originally architected by me, still set the bar in a lot of people’s minds). Fortify and Coverity are natural competitors: both Bay Area startups on a path to go public, with very different styles and strengths.

I’ve been consulting for Coverity investigating the possibility of breaking into the security space. Fortify is the clear market leader in “static analysis for security” but their support in the developer community seems very tenuous. Iron Chef Black Hat is pitched at the penetration tester community, where Fortify has quite a few partners. From a competitive standpoint, what can we learn?

Brian Chess, chief scientist at Fortify Software, and Jacob West, who manages Fortify Software’s Security Research Group, tell CNET’s Robert Vamosi that one team will use static analysis while the other will use fuzzing. Chess confirmed that Charlie Miller and Jacob Honoroff will be on the fuzzing team, and Sean Fay and Geoff Morrison from Fortify will make up the static analysis team.

If there were a “fantasy league” for for security competitions, Charlie and Jake would be ultra-hot commodities: along with their Independent Security Evaluators colleague Mark Daniel, they won Tipping Point’s pwn2own contest in April. On the other hand, in this contest they were using open-source tools and going up against a million-dollar commercial tool set — run by the experts. With reputation at stake on both sides, the pressure was intense ..

A panel of three security experts acted as the judges, and voted two to one for the fuzzing team. “I’m amazed at how well the static analysis team did,” said Mozilla’s Window Snyder, who cast the deciding vote. “But the fuzzing team just did a better job.”

Indeed. Amazingly, both teams found exploitable vulnerabilities within an hour. The static analysis team found a remotely-exploitable vulnerability that required a server admin to upload a MP3. This would probably be difficult to exploit in practice without some significant social engineering, but still, not bad at all for 60 minutes.

The fuzzing team, by contrast, found a different vulnerability, one that was exploitable without any user action — the most dangerous kind. They used a variety of tools for this, including Sulley and GPF (both of which found the vulnerability), ProxyFuzz, as well as valgrind and a homebrew tool for monitoring. Of course there are some excellent commercial tools out there as well; it’s still remarkably impressive how good the open-source offerings are in this area.

“Fuzzing has been very successful for us and found lots of vulnerabilities. My experience with static analysis has been that there are so many false positives that it can be difficult to get any real value out of it. I was impressed that these guys were able to identify what appears to be a significant issue in such a short period of time using static analysis tools, and it made me reconsider whether it was time to take another look at these tools.”

Window is one of the few people who consistently gets me to look at static analysis in a different way, so when she says stuff like this, I listen. With fuzzing when you get a hit, you know you’ve got a vulnerability that’s exploitable at least for a denial-of-service and you’ve also got the information you need to explore the exploitability. On the other hand, fuzzing’s limited by test cases, so static analysis potentially provides a valuable complement. Static analysis typically has just a tiny percentage of warnings corresponding to real vulnerabilities: a lot are false positives, a lot of the remaining ones are defects but not vulnerabilities, and a lot of the vulnerabilities aren’t exploitable. At Iron Chef Black Hat, the experts were able to cut through the noise and zero in on a real vulnerability very quickly.

Of course, people using tools they’ve helped develop (as both teams did here) unsurprisingly get results that are a lot better than anybody else does.* Still, it’s an indication that with the right expertise, these tools are powerful enough to be very useful — it’s now a matter of making them more broadly accessible, to real people not just static analysis whizzes. And an important point not to overlook: if Fortify can do this, then most likely that’s the state of the art more broadly. So it seems to me that Window’s ahead of the curve as usual, and a lot of people will be taking another look at static analysis tools as complements to fuzzing.

The seven original Iron Chefs — and Kaga

Iron Chef Black Hat is a fantastic idea, tapping into the same competitiveness as pwn2own and the Capture/Defend the flag contests, and so it’s great to see it become an annual tradition. Black Hat and Fortify still have a ways to go if they want to get to the high bar that CanSecWest and Tipping Point set with pwn2own. First of all, it’s a real wasted opportunity that there’s nothing on Fortify’s blog — it would be great to know more about what defects the teams found, how they chose to focus on those sections of the code, etc. This in turn leads to a lack of blog discussion or press attention to this: I couldn’t find any coverage other than Tim’s and Michael’s.

And there’s a subtle point about the way Fortify presents the event as being basically about them, not about the participants. The Redmond Developer News article is an interview with Brian Chess of Fortify that doesn’t even mention the winners; the Dark Reading post originally misdescribed the winners as Fortify employees.** Compare and contrast with the way Tipping Point prominently featured the winners of pwn2own.

Despite these imperfections, there’s clearly a huge amount of value as well as fun in Iron Chef Black Hat. Still, I would be remiss in my duties as a strategist if I didn’t highlight some of the key takeaways for Coverity — or Armorize, Klocwork, Ounce, Veracode, anybody else competing with Fortify.

Even with the huge amount of press attention Black Hat got this year, Fortify doesn’t seem to have been able to capitalize on it. On top of that, the results certainly could be interpreted as “when used by experts, Fortify’s expensive static analysis tools are not as effective at finding vulnerabilities as open source fuzzing tools.” Fortify has other strengths as a company, but as market leader in the “static analysis for security” space they are looking potentially vulnerable.

Looking forward, the big battleground is likely to be for “tainting” defects in Java, PHP, .NET and JavaScript (as well as C/C++), which aren’t amenable to fuzzing in the same way that buffer overruns are. There’s clearly a lot of “upside potential” for static analysis here; when we first implemented some simple taint analysis in PREfix in 2001, we immediately an exploitable vulnerability in a Windows system-level component that (whew) hadn’t shipped yet. The key challenges are having an analysis that’s powerful enough that it finds vulnerabilites — and keeping the noise is at a low enough level that the tools are usable.

So it turns out there is a lot to learn from Iron Chef Black Hat. Props to Fortify for trying it, and to Black Hat for supporting them, and hopefully it’ll continue to improve over time. There’s huge potential here — imagine a couple of teams each for static analysis and fuzzing, using different tools; or showdowns in Java, .NET, and PHP as well as C/C++.

And if that’s too much to fit into an hour-long panel, hey, maybe Fortify can rent Mesa Grill and turn it into a party!

jon

* My experience bears this out: I was consistently able to get good results on tools that I had written months or years before anybody else could. “It’s almost like whoever designed the tool knew just how I think!”

** the author corrected this immediately after Charlie and I pointed it out to him, and so I do see this as Fortify’s responsibility to a large extent: what created this misimpression? Why didn’t they follow up and ask for a correction?

Thanks to the reviewers of the previous draft for the corrections and other feedback!

{ 4 }

Comments

I like how you separated market segments:
1) Security in financial systems
2) Reliability in embedded systems

There is even more disparity, but I like the overview of Fortify vs. Coverity you made.

Everything had a beginning. Coverity Prevent was xg++. Fortify SCA seems to be based on similar concepts as PREfix/PREfast and PMD: style checkers that build an AST model for a CFG and does either an automated DFA or CFA on it depending on the software weakness, followed by static taint propagation if the DFA locates sources, sinks, and pass-throughs via an entry-point. Often the only work the tool user has to do is to verify that the automated DFA made correct assumptions, and apply a criticality of sorts depending on where and how the data gets tainted.

SecurityReview was Undeveloper Studio, which in turn was based on slint (not to be confused with splint), which was obviously based on lint (another style checker that uses lexical analysis instead of an AST). StyleCop was PreSharp, which I would assume is based on PREfix/PREfast.

I don’t know much about Ounce, but I know that they did purchase AspectCheck from Aspect Security, which was a [Java-only?] bytecode analyzer similar to FindBugs or FxCop. Microsoft seems to be moving the security functionality of the bytecode injector FxCop to CAT.NET, but the public has only seen the free component of CAT.NET called XSSDetect.

Finally, of course, we must not forget Armorize, Checkmarx, Klocwork, and GrammaTech — who all have various products that are certainly worth mention here.

How well does static analysis work for buffer overflows? Well, there’s a lot of one-off cases. How about web application injections? Well, it depends on where the injection are coming from, but it should be able to handle any of these rather easily (especially the doom-and-gloom HTTP layer).

There are a multitude of problems that can happen in the code that would prevent a tool from getting the information it needs, or getting it correctly. We’re all still working out these problems.

I have heard that Ounce has planned support for Spring Security’s use of Depedency Injection (DI), which basically uses XML configuration files to abstract a layer of access to the under-lying components’ public interfaces. DI is based on the Plugin pattern from OO (although I think both were invented by Fowler and not Gang-of-Four), and it has a follow-on pattern called Inversion of Control (or IoC). These are some things that mess up the tools for web application secure code review, but there are many other cases. Other tools such as Checkmarx would support these one-off cases if you customized the tool to do it, but others like Fortify SCA leave you hanging…

Often, you want to combine the assets of multiple tool types. A little bit of unit testing or integration testing (or integration unit testing) can reveal needed information for inspection and vice-versa. Functional testing can reveal some information and has been the forefront of tools like HP DevInspect and IBM AppScan DE. HP DevInspect finally has some sort of legit static taint propagation, which is a lot more to be said from the other hybrid analysis tools on the market, such as Fortify PTA (previously Tracer).

What PTA does is still quite neat though. By injecting aspects in the bytecode, PTA can determine the code coverage of a functional test, such as one from a web application security scanner. Other code coverage and comprehension tools have various purposes in any sort of pen-testing process. Of course, code coverage has been around forever, too. If you know gcov (GCC) or lcov (Linux), maybe I should remind you that even these came from tcov, which has been around since when I first developed C code on a Sun workstation in the late 1980’s, and certainly it wasn’t the first either.

For example, with fuzz testing, Charlie Miller has spoken on occasion about Jared DeMott’s EFS and Pedram Amini’s PaiMei tools — and then goes on to apply static analysis techniques such as using satisfiability solvers to prove even more coverage and get deeper results. It’s no wonder that fuzz testing gets good results when it uses/combines the techniques of static analysis! These technologies are complementary. True dynamic analysis is white-box! That means that your tests (or tools) get the information, even if they have to go to the binary/bytecode layer to get that information.

There are tools that guide an expert, but are still considered fully-manual (instead of partially-automated like Fortify SCA, Klocwork K7, Ounce, Checkmarx CxSuite, GrammaTech CodeSonar, Coverity Prevent / Extend, Armorize, etc). Tools such as GrammaTech CodeSurfer and SciTools Understand are in use by the best and brightest such as Dowd, Schuh, et al (ISS nerds). Klocwork has an engine in K7 that will help with manual review, and they also provide an architectural analysis mode that will take UML and look for security patterns. It’s even possible to extract UML from C++, native Java, or almost any OO language using RTTI plugins for IDA Pro. If you’ve ever thought of using cscope, CScout, or doxygen to help with secure code review, then you can probably guess how useful this sort of information can be to the overall analysis.

In reality, all of this is intermixed technology. It all serves a purpose one way or another. It takes serious knowledge to know how to apply it. This is one reason why people say, “a fool with a tool is still a fool”.

It’s my opinion that the next “big thing” might indeed be people and not tools. People who know what to do and how to use and apply the tools. People who can write AOP and DI code to fix compile-time issues. People who can integrate continuously-integrated security integration unit tests. I don’t think the tools will be able to do these things by themselves.

Sure, there are going to be some classicly awesome situations where DevInspect and SecureObjects just “works”, but not all the time.

What I predict will certainly fail is WASS+WAF, or the even better idea of dynamic taint propagation. It’s too complex and it has no clear path. It’s also very language-specific and nowhere near as mature as static taint propagation — plus static taint propagation can be done behind the scenes — not in a production, working environment. This stuff will stay in the lab for quite some time.

OWASP ESAPI and Secure Software Contract Annexes are definitely the easiest routes to solve these problems when embarking on new projects. For legacy code, I really think that AOP, DI, and smart Refactoring (following some good inspection and testing) is the tried-and-true method that will provide fixes at the code layer. Refactoring is so much easier when you have code generated code and code generated stubs and tests. Building this software factory automation layer into your projects (and cross-projects) will go a long way towards increasing the quality, reliability, or security of any of the code you produce today or tomorrow.

Andre, thanks so much for the incredible comment — one of the most detailed short roundups of the tool space I’ve ever seen as well as important historical perspective. A couple of things worth highlighting:

Often, you want to combine the assets of multiple tool types.

This is a vital point, and one that’s too often overlooked both by customers who (very naturally) want a “one size fits all” solution, and by tool vendors who forget that the information they’re producing may well be useful for other tasks and tools as well. Scott Page’s book The Difference looks at cognitive diversity and problem solving, and while he’s focusing on people, a similar framework applies to tools — and to the combination of tools and people. Even just looking at static analysis, intersecting the results of two different high-noise analyses can potentially lead to significant noise reductions. When you start considering other approaches as well, it’s clear that there’s huge potential leverage here. Patrice Godefroid of Microsoft Research, who’s doing some of the most interesting work out there, describes his approach as “combining program analysis, testing, model checking and theorem proving”,

It’s my opinion that the next “big thing” might indeed be people and not tools.

Learning, at both the individual and organizational level, is another important aspect of “people and not just tools”. If the tools are usable enough to be deployed pervasively, then over time developers become steadily more aware of the issues … and make fewer mistakes. And the ability to track these metrics along with others, and then do root-cause analysis to understand why some groups significantly outperform others, lets the organization steadily improve itself.

What I predict will certainly fail is WASS+WAF, or the even better idea of dynamic taint propagation. It’s too complex and it has no clear path. It’s also very language-specific and nowhere near as mature as static taint propagation — plus static taint propagation can be done behind the scenes — not in a production, working environment. This stuff will stay in the lab for quite some time.

Hard to know (although I do agree that the WAF hype is way overblown). Static analysis is language- or byte-code specific as well; and while it’s a lot more mature, some of the most intriguing ideas still haven’t made it out of the lab — David Wagner et. al.s type qualifiers, for example, or Rob Deline and Manuel Fahndrich’s Fugue work from Microsoft Research. The time’s clearly right for a breakthrough; it seems to me that it’s likely to come from the combination of static and dynamic analysis, as well as other techniques, and in that context some of the stuff may well be ready to escape from the lab.

We shall see … exciting times in any case. What’s fascinating from a strategy perspective is that all of these tools have their strengths — none is a complete replacement for any of the others. What’s also fascinating, and disappointing, is that despite the steady improvement in tools and methodologies, the overall security experience for users isn’t getting any better.

These are the war games at West Point, at least last month, when a team of cadets spent four days struggling around the clock to establish a computer network and keep it operating while hackers from the National Security Agency in Maryland tried to infiltrate it with methods that an enemy might use. The N.S.A. made the cadets’ task more difficult by planting viruses on some of the equipment, just as real-world hackers have done on millions of computers around the world.

The competition was a final exam of sorts for a senior elective class. The cadets, who were computer science and information technology majors, competed against teams from the Navy, Air Force, Coast Guard and Merchant Marine as well as the Naval Postgraduate Academy and the Air Force Institute of Technology. Each team was judged on how well it subdued the threats from the N.S.A.