dehydra

I feel strange working on GCC-specific stuff and then discussing it on planet mozilla as mozilla work. However, without GCC, Dehydra and Treehydra would not be half as awesome (much less feasible even). The power of open source is that it allows us to leverage the entire open source ecosystem to achieve specific goals. When open source projects combine their efforts, not even the biggest software companies can compete as cross-project goals would be incredibly expensive and unpleasant otherwise.

Occasionally, it is very frustrating to see people treat open source software as immutable and independent black boxes. In my personal experience, the browser and the compiler are viewed as finished products and therefore it is OK to bitch and complain about them. That’s frustrating because the same users could be channeling that energy in a more positive way by reporting bugs, contributing code/documentation, etc.

Sometimes these rants result in rather comical conclusions: Ingo’s rant is priceless. My perspective on this:

what have Linux kernel devs done to help GCC help them?

<flame>Sparse is a deadend. Writing compiler code in C is silly, writing analysis code in C is sillier (and frustrating and limiting). Taking a crappy parser and bolting a crappy compiler backend onto it will result in bigger pile of crap Given how smart kernel devs are, they sure like wasting their time on crappy solutions in crappy languages.</flame>

Wouldn’t it be cool if instead of complaining these talented people wrote a GCC plugin to do what they want?

GCC Plugin Progress

I finally landed the massively boring and annoying GTY patch. I can barely believe that the patch went in so smoothly without excess complaining from GCC devs. From GCC perspective it’s merely a cosmetic cleanup that affects a large number of headers. For us it enables Treehydra to be generated via Dehydra with little manual effort. It basically makes Treehydra possible without patching GCC. I have another 3-4 patches that need to land before trunk GCC can run the hydras out of the box. Those are mainly localized bugfixes and cleanups so I fully expect them to go in and for GCC 4.5 to rock my world.
Once GCC 4.5 ships. analyzing code will depend on a trivial matter of apt-getting(or equivalent) the hydras and specifying the analysis flags on the GCC commandline!

The nice folks at FSF allowed GCC have plugins. In a couple of GCC releases, Dehydra(4.5 if we are lucky) will work with distribution GCCs. Of course the API is yet to be decided on, but we have been coordinating with authors of other GCC plugin efforts to ensure that the final API meets reasonable needs.

In the future enabling static analysis checks will involve little more than specifying –with-static-checking in your Mozilla build!

JSHydra

The other breakthrough news is that Joshua Cranmer has been working on hooking up a *hydra style API to the Spidermonkey parser. This resulted in JSHydra. Ability to look into JavaScript has been sorely missing from our stack, so this is extremely exciting.

Some time ago, Igor mentioned that there is code in SpiderMonkey that pleads to the programmer that from a certain point in a function code must flow through a label(ie a finalizer block). Treehydra made it to possible to turn that weak plea into an error message when static checking is enabled. See the bug for more details. My favourite static analyses are all about turning informal “gurantees” into angry compiler complaints.

This is my first static analysis that landed in the mozilla-central tree. It’s also the simplest one and may be a decent starting point for solving similar problems. I’d be cool to see this particular feature utilized outside of SpiderMonkey. Unlike human-powered code-inspection, it excels at finding accidental early returns covered up by macros.

I planned to release Pork 1.0 for a while now. The tools work great, even if all the love is going to the GCC-based toolchain. However, after hearing grumpy comments from a certain coworker about the uglyness of the oink build system it dawned on me that it’s rather mean to release such a mess and call it 1.0.

So I think I’ll release Pork 0.9 in the current state, so I can focus on near term GCC toolchain work. Pork in the current form means oink stack + my refactoring tools + changes to elsa and other libs to support C/C++ refactoring needs.

This will be followed up by Pork 1.0. 1.0 will involve changes to the build system to get rid of oink(we only use the oink build system and rarely use oink API). To put this another way: I don’t expect any functionality changes between 0.9 and 1.0 other than an improved build system to make it easier to get started with writing new tools.

Pork – Future

I am pretty happy with Pork as it is. I think we’ve taken Elsa as far as it’ll let us go. The only realistic improvement on the Pork side may be to have Dehydra generate a JS binding to Elsa’s extensive AST to make rewriting stuff easier. However, I’m not sure if that’s worth the effort nor that a C++ AST will reflect into JavaScript as well as GCC GIMPLE.

Preprocessing

On the other hand, something needs to be done about the main ingradient that makes Pork tick: MCPP. MCPP does a lovely job of annotating what the C preprocessor is doing, but configuring GCC to use a foreign preprocessor is a giant hassle and making sure it works correctly is troublesome. At the GCC summit, Tom gave me an idea on how similar functionality can be added to GCC directly by extending the include backtrace with macro expansions. Not only would such integration simplify Pork setup and increase Pork’s operating speed, but it is also a clean way to expose preprocessor constructs to the AST presented in De/Treehydra. It should allow for more preprocessor awareness directly in analysis stage of refactoring instead of only in the final rewriting stage as is currently done. As a side-effect, GCC would gain better error messages too.

So while this isn’t going to affect Pork directly, it will simplify the lives of Pork users while opening new analysis frontiers. Even though I hate working on preprocessor stuff, I think this work will need to happen sometime in the near future.

Dehydra 0.9 has been out for a while, I planned to release 1.0 soon after unless there are major flaws discovered in the API. The situation changed at the GCC summit. The fact that FSF reversed their stance on GCC plugins means that we should be concentrating on getting the plugin stuff reviewed.

So in the near term I’m forward porting the plugin stuff to trunk GCC, then I’ll be generalize the plugin API to suit at least one other GCC plugin user that we met with at the summit. The downside is that I don’t want to release Dehydra 1.0 and immediately break the plugin API. The upside is that the new API should be more general and more minimalistic and will likely be close to what will eventually become an official plugin API.

Summary: In my mind Dehydra and Pork are 1.0 quality, but I want to future-proof them a little bit before calling them 1.0.

Our presentation on Treehydra and Dehydra GCC plugins was received well at the summit.

The big news is that FSF is working on license changes to allow GPL-only GCC plugins. I’m looking forward to having our work be compatible with future GCC without any patching.

In a few minutes we’ll be having a meeting with users of other plugin frameworks to have an initial discussion on a common API. I’m working on forward porting my patches, so they can start getting reviewed ahead of license changes.

After writing a ton of docs and working through other Dehydra0.9 blockers, I decided to cool off by doing some actual analyses. Before I get to that, I’d like to say that the last big task is to setup a buildbot for Dehydra on Linux/OSX. Thanks to yet another awesome contribution from Vlad, that’s mostly done.

So I got working on GC-safety static analysis. Originally we tried to define a complete spec before writing a single line of code. That turned to be a bad idea and resulted in a spec full of bugs. This time we are defining the analysis incrementally and as a surprise reward, it already caught a bug.

Pushing and Popping Our Way

SpiderMonkey has a lot of complex code doing applying Push/Pop-like operations on variables in a function-local manner. Examples of functions that this analysis would look at are: JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT and JS_LOCK/JS_UNLOCK. See bug for more. Essentially, this will help with “code must flow through here” comments on “out:” goto labels that inhabit the SpiderMonkey source.

This is an example of control-flow-sensitive analysis. It impossible without a compiler-like view of the code that Treehydra provides. It also helps to have a scalable algorithm to iterate the CFG. Luckily, David Mandelin wrote such a beast by implementing ESP for his outparam analysis. David factored-out the ESP analysis and made it available for reuse. See esp_lock.js in the test suite for an example of how to write control-flow sensitive analyses. locks_valid*.cc and locks_bad*.cc illustrate the code patterns that can be scanned for.

So if you know of any further push/pop patterns in the rest of Moz that can be checked in this manner, leave a comment.

PS. This is yet another account of Treehydra rocking the static analysis world. Exposing the slightly scary, but awesome GCC gutts via JavaScript allows one to perform precise static analyses in a civilized manner. What could be more fun?

Docs, tutorials and more docs. Currently, the plan is to puts more documentation on MDC and have it also serve as a webpage. Any dehydra/treehydra guides or API doc contributions are welcome. For now if you need help, feel free to ask on the mailing list or #mmgc on irc.mozilla.org

Verify, document and maintain the OSX port. Vlad Sukhoy did a lot of heavy lifting to make this happen, now we need to cement his achievement by setting up a buildbot

Spread the word! I would like to see other large projects such as KDE, OpenOffice, etc adopt application-specific static analysis in the form of *hydra. I am interested in seeing people use *hydra to scan code for security vunerabilities. Ok, so this isn’t really needed to release Dehydra 0.9, but I am impatient!

RIP: Oink Dehydra

Between GCC Dehydra and Treehydra, there is nothing that pork Dehydra could do better, so I finally removed Dehydra from Pork. From now on Pork’s purpose is large-scale C/C++ refactoring. For everything else one should use Dehydra.

After a few weeks of mindnumbing work on treehydra gutts, I finally have something exciting to talk about!

We will be presenting Dehydra at the GCC Developer’s Summit in lovely Ottawa. The GCC version of Dehydra exceeded all of my expectations, so it will be exciting to meet awesome GCC hackers who lay the groundwork to make this possible. Got suggestions for other venues to present Dehydra?

Packaging Help Needed

I feel that the Dehydra concept is getting mature enough for a 1.0 release. Recently baked GCC 4.3 means I’ll be able to distribute a 4.3-specific plugin patch(currently it’s against trunk, aka 4.4to-be). Now I need README, LICENSE, configure files, etc.

I will need help with packaging dehydra + patched gcc into .dpkg and .rpm files. Leave a comment, email me/static analysis list or poke me in #mmgc on irc.mozilla.org if you can help with packaging.

Logo/Mascot Wanted

Since every serious project has a cool mascot, it would be cool to get one for Dehydra. I’d be curious to see what people think could symbolize a code scanning monster that makes grep feel inadequate. I have a feeling a cartoon version of a giant Heavy Metal Duck might be it, but I haven’t made up my mind yet.

Treehydra What?

Treehydra is a work-in-progress name for the low-level equivalent of Dehydra. Currently it is built as separate GCC plugin. I haven’t yet made up mind on whether Treehydra will end up extending Dehydra or stay a separate tool. Since treehydra needs dehydra for bootstrap, they’ll stay separate for now.

Last week I managed to run treehydra to completition on my mozilla checkout and walk the resulting AST in JS correctly. Now comes the fun part of making it do useful tricks.

process_type is called every time GCC hits a class declaration or a template is instantiated(also for enums and unions, but those get ignored with the .kind check). Then input_end is called when GCC is done processing the file. this.aux_base_name is the input filename.

I hooked up this script to the mozilla build by adding the following to .mozconfig:
export CXX=$HOME/gcc/bin/g++
export CXXFLAGS="-fplugin=$HOME/work/gccplugin/gcc_dehydra.so -fplugin-arg=$HOME/work/gccplugin/test/count_classes.js"