Macs, Modularity and More

Thoughts on LLVM and Clang

I've been a fan of LLVM for a while, and of Clang specifically. However, I didn't have the chance to investigate either of these in depth until fairly recently, and I thought it's worth repeating what I've been doing.

LLVM is a lot more than just a simple framework for compilers; it's more like a generic assembly language which maintains strong typing and logical variables throughout the code path, and then at the end being turned into hardware-specific machine code. It's also used to dynamically build up such programs on the fly and have them executed; Apple uses LLVM to optimise OpenGL effects (and falling back to interpreted where hardware acceleration isn't available).

Clang is a compiler, built on top of LLVM (and thus able to take advantage of all of the low-level performance optimisations, as well as its own), that happens to compile C, Objective-C and (with 2.0) C++ programs. It's embeddable, in that IDEs can host the runtime in-situ (instead of having to call externally and parsing results) which means that it can interact with an IDE in a much more pro-active way than before (or at least, without having to re-implement the parser multiple times).

One of the things this buys you is free analysis of the source code. Given that you have full information of the source, including its defines and include paths, it is possible to find call sites that are questionable. This includes the simple lint checks like doing an assignment inside an if block; but it can do quite a lot more complicated analysis as well, such as determining loops and states of variables (e.g. on the first iteration of the loop you nullify a variable; on the second, it may go down a different callpath).

There's a blog entry on the LLVM blog about amazing feats of clang error recovery, and it's well worth a read. Not only that, but it works better than this blog entry gives you hope for; it really is solid stuff.

Apple have taken it further with the integration into Xcode; not only does it parse the Clang-driven error messages, but when it's explaining how it arrived at a conclusion, Xcode will narrate the code path with call arrows in order to indicate the potential problem. I was able to use this to generate a whole heap of fixes for the Mac ZFS port, pretty much all driven from the results of running the Clang static analyzer (they've got a good screenshot of the Xcode there; but interestingly, you can also drive the results through a browser as well).

The other aspect of LLVM is that it's fast; faster than GCC, anyway. You can run Clang in two modes; as a backend to GCC (so it presents itself to the tools as GCC, but uses the LLVM internals) or as a standalone clang compiler. I was able to build the entire Mac ZFS codebase with LLVM in 22s, with the LLVM+GCC and GCC combinations taking 29s. Not too shabby, considering that it's only a single command line switch away.

There's also an upcoming LLVM 2.0, which will have a revamped debugger called LLDB. This uses the same parser as the Clang compiler does; so when the debugger starts up, it's able to provide you information based on your specific object types, and evaluate watch expressions based on C code, rather than the subset that is supported by GCC.

At this point, the GCC compiler is considered strictly legacy on the Mac; and FreeBSD is compiling with LLVM already. And with the new libcxx library, it won't be long before it's doing all the C++ coding as well. The only question remains: how long will it take others to migrate to LLVM?