Not Republican propaganda

You are here:

I work at Red Hat on GCC, the GNU Compiler Collection. For the next major release of GCC, GCC 10, I’ve been implementing a new -fanalyzer option: A static analysis pass to identify various problems at compile-time, rather than at runtime.

My thinking here is that it’s best to catch problems as early as possible as the code is written, using the compiler the code is written in as part of the compile-edit-debug cycle, rather than having static analysis as an extra tool “on the side” (perhaps proprietary). Hence, it seems worthwhile to have a static analyzer built into the compiler that can see exactly the same code as the compiler sees—because it is the compiler.

This issue is, of course, a huge problem to tackle. For this release, I’ve focused on the kinds of problems seen in C code—and, in particular double-free bugs—but with a view toward creating a framework that we can expand on in subsequent releases (when we can add more checks and support languages other than C).

My hope is that the analyzer provides a decent amount of extra checking while not being too expensive. I’ve aimed for -fanalyzer to “merely” double the compile time as a reasonable trade-off for the extra checks. I haven’t succeeded yet, as you’ll see below, but I’m working on it.

Right now the code is in GCC’s master branch for GCC 10 and can be tried out on Compiler Explorer, aka godbolt.org. It works well for small and medium-sized examples, but there are bugs that mean it’s not ready for production use. I’m working hard on fixing things in the hope that the feature will be meaningfully usable for C code by the time of GCC 10’s release (likely in April).

This response shows that GCC has learned some new tricks; first, the ability for diagnostics to have Common Weakness Enumeration (CWE) identifiers. In this example, the double-free diagnostic is tagged with CWE-415. This tag hopefully makes the output more clear, improves precision, and gives you something simple to type into search engines. So far, only diagnostics from -fanalyzer have been tagged with CWE weakness identifiers.

If you’re using GCC 10 with a suitable terminal (e.g. recent gnome-terminal), the CWE identifier is a clickable hyperlink, taking you to a description of the problem. Speaking of hyperlinks, for many releases, when GCC emits a warning it prints the option controlling that warning. As of GCC 10, that option text is now a clickable hyperlink (again, assuming a sufficiently capable terminal), which should take you to the documentation for that option (for any warning, not just the ones relating to the analyzer).

Second, GCC diagnostics can now have a chain of events associated with them, describing a path through the code that triggers the problem. Given the lack of control flow in the above example, it has just two events, but you can see how the second event refers to the first event in its description.

Here’s a more involved example. Can you see the issue in the following code? (Hint: It’s not a double-free this time):

The above is rather verbose, though perhaps it needs to be to convey what’s going on, given the use of setjmp and longjmp. I hope the description is reasonably clear: There’s a memory leak that occurs when the call to longjmp unwinds the stack back to outer past the cleanup point in middle, without invoking the cleanup.

If you don’t like the ASCII art above, you can view the events as separate “note” diagnostics with -fdiagnostics-path-format=separate-events:

or turn them off altogether with -fdiagnostics-path-format=none. There’s also a JSON output format.

All of the new diagnostics have a -Wanalyzer-SOMETHING name: We’ve already seen -Wanalyzer-double-free and -Wanalyzer-malloc-leak above. These diagnostics are all enabled when -fanalyzer is enabled, but they can be selectively disabled via the -Wno-analyzer-SOMETHING variants (e.g., via pragmas).

What are the new warnings?

As well as double-free detection, there are checks for malloc and fopen leaks:

What’s left to do?

As it stands, the checker works well on small- and medium-sized examples, but there are two problem areas I’m running into as I scale it up to real-world C code. First, there are bugs in my state-management code. Within the checker are classes for describing program state in an abstract way. The checker explores the program, building a directed graph of (point, state) pairs with logic for simplifying state and merging state at control flow join-points.

In theory, if the state gets too complicated, the checker is meant to go into a least-defined state, but there are bugs with this approach that lead to the number of states at a given point exploding, which then leads to the checker running slowly, eventually hitting a safety limit, and not fully exploring the program. To fix this, I’ve been rewriting the guts of the state-management code. I hope to land the rewrite in “master” next week.

Second, even if we do fully explore the program, the paths through the code generated by -fanalyzer are sometimes ludicrously verbose. The worst I’ve seen is a 110-event path for the use of uninitialized data reported when compiling GCC itself. I think this one was a false positive, but clearly it’s unreasonable to expect users to wade through something like that.

The analyzer tries to find the shortest feasible path through the (point, state) graph, generates a chain of events from it, and then tries to simplify the chain. Effectively, it’s applying a series of peephole optimizations to the chain of events to come up with a minimal chain that expresses the problem.

I recently implemented a way of filtering irrelevant control-flow edges from the path, which ought to help, and I’m working on a similar patch to eliminate redundant interprocedural edges.

To give a concrete example, I tried the analyzer on a real bug (albeit one from fifteen years ago)—CVE-2005-1689, a double-free vulnerability in krb5 1.4.1. It correctly identifies the bug with no false positives, but the output is currently 170 lines of stderr. Rather than showing the output inline here, you can see it at this link.

Initially, the above was 1187 lines of stderr. I fixed various bugs and implemented more simplifications to get it down to 170 lines. Part of the problem is that the free is being done through a krb5_xfree macro and the path-printing code shows how each macro is expanded each time an event occurs within a macro. Perhaps the output should only show each macro expansion once per diagnostic. Also, the first few events in each diagnostic are interprocedural logic that’s not really relevant to the user (I’m working on a fix for that). With these changes, the output should be considerably shorter.

Perhaps a better interface might write out a separate HTML file, one per warning, and emit a “note” giving the location of the additional information?

I want to give the end-user enough information to act on a warning, but without overwhelming them. Are there better ways of presenting this? Let me know in the comments.

Trying it out

GCC 10 will be in Fedora 32, which should be out in a couple of months.

For simple code examples, you can play around with the new GCC online at godbolt.org (select gcc “trunk” and add -fanalyzer to the compiler options).

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

disable

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.