Rewriting Tools for Mozilla 2: Moving Forward as Planned

In the Beginning There Was a Void

Approximately a year ago, Brendan discussed with me the crazy possibility of rewriting most of the Mozilla code automatically to modernize the codebase. The benefits were huge. Gecko would use the C++ standard library to improve code readability and reducing size, XPCOM would be ripped out of the core to improve performance and decrease footprint, etc.

It seemed like a good idea, but in reality no other giant C++ project has attempted this before so we were not sure of how realistic it was. I spent a year in a lonely corner of Mozilla trying to materialize the idea.

Brendan & Graydon pointed me to elsa, the C++ parser that supposedly could parse Mozilla. However, it turned out that it was only able to parse an old version of Mozilla and rejected the new source. One of the elsa maintainers even tried to convince us to it was not designed for source-to-source transformations and wouldn’t work that way.

After I patched up elsa and started devising ways to use it for source rewriting I ran into more pain. After a few false starts, I realized that C++ in Mozilla is actually a mix of CPP and C++ and one can not rewrite C++ without dealing with the mess that is macro expansion. MCPP was pointed out to me as a good starting point for hacking on a preprocessor. So I designed an inline log for macro expansion. To my surprise the maintainer of MCPP, Kiyoshi MATSUI, volunteered to implement the spec and thus saved me from a world of pain. (For which I am eternally grateful as I can’t imagine a more depressing pastime than working on the root of all evil: the C preprocessor).

In parallel with Kiyoshi’s work I modified elkhound & elsa to make the C++ parser a lot more suitable for source transformations. I learned about LR & GLR parsing and confirmed my suspicion that I don’t want to write parser generators for a living.

Happy Conclusion

All this work finally got us what we discussed last September: a framework for doing lots of boring code rewrites.

The first big Moz2 task is switching from reference counting to garbage collection. Today, garburator produced a gigantic patch for subset of the content/ module and all of the affected files compiled. Hopefully next week I’ll have a multi-megabyte patch for the whole of Mozilla that compiles and possibly runs.

This entry was posted on Friday, October 12th, 2007 at 3:31 pm and is filed under DeCOMtamination, dehydra, garburator.
You can follow any comments to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

I remember reading and hacking a bit (as part of a DEC-2060 port of Unix tools) the original John Reiser CPP. It was assembly-style C with lots of raw pointers into a giant buffer, sliding things around during macro expansion as overflow threatened, all coded and indented in an even-uglier-than-usual style for the time; Unix code outside of the kernel and dmr’s C compiler was not always pretty. My eyes bled for a week ;-).

Robert: it’ll be QA’d by mozilla2 devs and (I hope this is coming on line soon) the usual (and growing over time) testing infrastructure, cloned from the CVS trunk: mochitest, reftest, the latest and greatest leak tests including sayrer’s brute-force leak detector, etc.

Then on to building the Mozilla 2 effort to include more and more of the community as 1.9 / Firefox 3 wraps up. After a few alphas, we’ll have shaken out any issues.

But I do not expect lurking badness in a generated patch, if the analysis that generated the patch is sound and valid. That analysis development is the real process to QA here.

So Taras has been focusing on tools in order to get Mozilla 2 to a smaller, faster, easier to hack codebase, and he and Benjamin are making builds that can be tested. Help welcome.

I’ve been poking around Elkhound and Elsa trying to learn how to write a parser using it. It’s fun, and occasionally “fun as in having a root canal”.

I’m working on a small Lua parser mostly to get my feet wet, but the ultimate goal is JavaScript. I’m no “l33t k0d3r” by far, but I’m a fast learner, heh. I have this thing where I want to build a successor to MXR/Bonsai, with multiple-VCS support (pluggable, essentially), syntax highlighting and semantic parsing.

One thing I noticed was that trying to clone either of the Elkhound and Elsa repositories from hg.m.o did not work. I had to download using one of the zip links to get all the files. Something seems weird in that regard. (“hg pull -r tip” after cloning basically says “you already have the tip” even though I don’t have all the files visible on hgweb.)

@monk.e.boy : The benefit over textual, regex based, replacement is easy : As the tools *understands* what it’s doing, it will be able to handle zillions of /small/ variation that in the regex case every time require to you to add yet another special case, and if some case is really complex enough to break it, it will give you a syntax error instead of happily outputting garbage or worse creating code that compiles but will crash on execution.

Hm, I believe the OpenOffice.org codebase also contains a lot of legacy code which is there because C++ and its libraries didn’t offer all what they offer now. They’d probably be interested in such a framework.