For make benefit of glorious tubes

DXR gets more correct, less case-sensitive

DXR, Mozilla’s evolving codebase search engine, has been taking patches at a furious rate these last two months. A great deal of work has gone into a UI refit, still in progress, which will improve discoverability, consistency, and power. Meanwhile, we have kept pushing more immediately enjoyable enhancements into production.

Cleaning Out The Pipes

One of these is a complete rewrite of our HTML generation pipeline. DXR pulls metadata about code from a number of disparate sources: syntax coloring from Pygments, locations of functions and macros from clang, links to Bugzilla bugs from a glorified regex. It then encodes those as begin-end pairs of text offsets, which it stitches together to make the final markup. However, the stitching was previously handled by a teetering state machine, stuffed info a single monolithic function with zero test coverage, replete with terrible mystery. As it turned out, it had been generating grossly invalid markup for some time. Fortunately, modern browsers are equally replete with terrible mystery and managed to make some semblance of sense out of things like </a></a></a>.

But now that’s gone away. The rewrite brings…

Correct markup

Support for line-spanning regions, as for multi-line comments or strings

Support for Windows line-endings (of which we did have a few in mozilla-central)

Full test coverage

And, perhaps most importantly in the long term, it modernizes our plugin contract by supporting annotation regions which overlap. This lets us enjoy truly decoupled plugins which no longer have to care if they’re used with others that emit overlapping regions. We can add plugins that support more languages and more types of analysis without having to worry about whether they’ll play nicely with the existing ecosystem. It also makes development of plugins outside the DXR codebase more practical.

Other Improvements

Other user-visible improvements include…

Case-insensitive searching for plain text. This is now the default.

Exposing values of constants using tooltips

Results now show in alphabetical order by path rather than in random order, so you can rule out entire directory trees more easily.

Searching for Layers.cpp:45 takes you straight to that line of the file.

Lexing .h files as C++ rather than C means we now highlight all those pesky C++ keywords.

We now syntax-color preprocessor directives in JS.

We’ve introduced override and overridden queries.

No more “l” in line-number anchors means no more mistaking them for “1”.

Fixed an off-by-one in line annotation position.

No longer consider uninitialized struct or class members to be var refs.

Support non-UTF-8 encodings of source files.

Distinguish identically named functions in different anonymous namespaces.

Thanks to James Abbatiello for lots of analysis improvements, Nick Cameron for the handy line-number search and syntax coloring, jonasac for several great fixes, and Schalk Neethling for a huge amount of work toward getting the UI refit out the door. If you’d like to join the DXR hacking community, we’ve got a nice ramp-up paved out for you and some easy bugs tagged.

New UI Teasers

As for the upcoming UI refit, there’s plenty in store:

A natural integration of the now fairly disjoint browse and search modes

Easy discoverability of all 26-or-so search filters: no more figuring them out through hearsay or by spelunking through the code

No more unpredictability of interface elements like the Advanced Search panel

First-class support for multiple trees, to be followed by more actual trees

A real query parser. You can express quotation marks without resorting to regexes, and you can use quoted strings as arguments to filters.

Erik Rose coordinates the impact of 108 spring-loaded buttons at Mozilla, venting a byproduct of static analysis, search, and pattern-finding software. His past selves have done realtime fuzzy matching against the corpus of U.S. voters at Votizen, caused the Django community's tests to run in funny orders, written a book about Zope and Plone, and released a bevy of eclectic Python libraries. When not speaking or coding, Erik retreats to his glacier-carved fortress in the wilds of North Carolina, where he discusses formal language theory with his dog, Max.