Why CGC Matters to Me

In 2008 I started as a new assistant professor at CMU. I sat down, thought hard about what I had learned from graduate school, and tried to figure out what to do next. My advisor in graduate school was Dawn Song, one of the top scholars in computer security. She would go on to win a MacArthur “Genius” Award in 2010. She’s a hard act to follow. I was constantly reminded of this because, by some weird twist of fate, I was given her office when she moved from CMU to Berkeley.

The research vision I came up with is the same I have today:

Automatically check the world’s software for exploitable bugs.

To me, the two most important words are “automatically” and “exploitable”. “Automatically” because we produce software far faster than humans could check it manually (and manual analysis is unfortunately far too common in practice). “Exploitable” because I didn’t want to find just any bugs, but those that could be used by attackers to break into systems.

Think about it: if we could develop computers that could automatically find vulnerabilities, then the good guys could fix them first. I say “we” here because I’m just one of many in research, and even more in industry, who want and work to make progress towards this goal.

So why am I excited about the DARPA Cyber Grand Challenge (CGC) today? CGC gives the world, for the first time, an objective competition to measure how well different approaches to the problem work on a level playing field. We can test, we can compare against others, and if we miss something — some important detail others figured out — we’ll know. And just to incentivize “not missing important things”, there is $3,750,000 in prize money for the best automated systems.

I wish there was something like CGC in 2009 when I first started as a professor. Back then, I was thinking about automated analysis and I worked on a paper called “Automated Exploit Generation“. This paper was one small step towards the vision. My students and I presented a few techniques for taking an off-the-shelf binary program, automatically find bugs, and then trying to automatically generate working exploits for the bugs we found.

This paper built on a lot of existing work in the area to build something very cool, but which was still very much a prototype. We didn’t explore every technique and even some key known techniques, like fuzzing, were omitted so we could focus on one variable at a time. As many pointed out, our paper was not the first attempt at having a computer generate an exploit and it certainly wasn’t going to replace humans at exploiting bugs in modern software.

The biggest limitation, though, was that it was just not practical for many programs. I’m not the first to say this, but I’ll be the first to agree! There were a huge number of limitations. It didn’t work on Chrome, Adobe Acrobat, or larger programs in general. Even on small programs it sometimes choked. We automated only basic exploitation techniques that are easy to defend by modern OSes. Today in CGC we’re far ahead of that early work. Some may raise legitimate concerns that CGC challenges are not “real” programs, that we’re still not on Chrome, and that program X or vulnerability Y is important but not tested. All true, but not the point of CGC to me.

The point of CGC to me is to answer the lingering question: are our techniques competitive against the best alternative approaches? We can’t know this without a firm, fair set of tests.

What we needed was a competition like CGC: a competition where we could test against other tools on the same benchmarks and the same machines. A competition motivated by $2,000,000 for the best automated analysis, with $1,750,000 more for the two runners up. A competition where we could shine a light on the limitations of current tools for all to see, and so that as a community has a landmark to measure future improvement.

My two students and I founded ForAllSecure based on their AEG research at CMU. We will get our chance to measure our techniques against others in CGC in a little over a week. And we’ve grown from the initial 3 to 9 employees as a strong Pittsburgh startup and CMU spinoff, where all developers have experience hacking as part of PPP (a competition hacking team). I feel very fortunate to work with a team that is extremely talented, who share the desire to build the world’s best CGC system, and ultimately build tools for checking the world’s software for exploitable bugs.

We’re also fortunate that we are competing with some really gifted teams. Shellphish is primarily from UCSB, and the developers of the open source Angr. Grammatech is a spinoff from Tom Reps research group in program analysis. DeepRed is from Raytheon SI, one of the go-to government contractors for cybersecurity solutions. And, of course, Dawn Song, my advisor, and mentor. (Yikes! Competing against your advisor!). Even those teams with which I’m less familiar, like CSDS, have done an amazing job working on the cutting edge of automated security. ForAllSecure did great in the qualifying event a year ago, but the finals is a whole new ballgame and all the teams will pose stiff competition.

As the proverb goes, “The best time to plant a tree is 20 years ago. The next best time is today.”

I’m excited about CGC because it plants the right tree. Maybe in 20 years we’ll beat the best humans. As a useful historical reference analogy, in the 1960’s chess computers were available (maybe even earlier), but it took until the 1997 victory of Deep Blue over Kasparov for computers to be widely seen as competitive against the best humans. Chess took over 30 years, and it’s orders of magnitude easier than computer security!

CGC has limitations. We won’t get challenges as complex as modern browsers. There will be vulnerabilities that humans could find that we will miss (just watch DEFCON CTF afterward and I’m sure some will come up). The automated systems will be far from perfect.

But CGC — a competition where tools compete on an equal playing field — is the right idea to encourage continual progress. It gives the community a new way to incrementally grind down the problem of finding and fixing vulnerabilities at computer speeds. On August 4, 2016, CGC will run the first public tournament that records exactly where automated tools stand.

Post navigation

One thought on “Why CGC Matters to Me”

As the team captain for team CSDS, I have to agree with much of what David posted here. I believe we need automation to solve modern cyber security problems. Programmers make mistakes, and those mistake put vulnerabilities into software. If we can build tools that can analyze software and harden it, then this helps. We will never solve all of the problems, or patch all of the bugs. But what if we could patch 70% of the vulnerabilities in 70% of the software that is out there? That would be a great win. I haven’t done the math, but I believe even 20% would be a win.

I was surprised and impressed that the score differential between the first place and last place team was within 15%. Collectively the teams found and patched many possible vulnerabilities. We are all still analyzing the results and seeing where we could do better. After the event we ran JIMA against the 83 CFE challenges as posted by DARPA. JIMA successfully patched all but one of the TYPE 1 vulnerabilities provided in the reference POVs posted by DARPA, each patch taking less than a minute (several in just a few seconds). We did not try to patch the TYPE 2 vulnerabilities in this case. Our problem is that this is a generic patch, and the execution overhead was much higher that what we were seeing in our previous experiments. Mayhem had better performance on its patched systems.

I think there are many ways to solve these problems, and we as a community will be moving this research forward.