Method

I imagined we would recurse in exhaustive breadth and exhausting depth. Instead, we recursed only on the most interesting items, and refined a checklist of starting points:

What was the bug?

What went wrong in the developer's thinking that caused the bug to be introduced?

What made the bug exploitable?

What caused us to use especially dangerous features of C++?

Could a new abstraction make it possible to do this both fast and safe?

What caused the bug to persist? Could we have caught this earlier with improved regression tests, fuzz testing, dynamic analysis, or static analysis?

Luke and I made trees for all ten bugs, at first on paper and later using EtherPad. Then I extracted and categorized what I thought were the most useful lessons and recommendations.

Recommendations for introducing fewer bugs

Casts

Create centralized, type-restricted cast functions. This protects you when you change the representation of one of the types. It also protects against mistakes that cause the input type to be incorrect.

Sentinel values

Use tagged unions instead.

Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.

Audit existing code paths to ensure they cannot generate the special value.

I should ask Blake and Andreas for help with testing compartments and wrappers.

I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.

We should give JS OOM fuzzing another shot.

Next steps

I'm curious if others have additional ideas for what could have prevented the ten bugs we looked at. For example, someone like Jeff Walden, who loves to write exhaustive regression tests, might have ideas that Luke and I did not consider.

I'd also like to do this kind of analysis with a other developers on bugs they have fixed.

Lithium is great at reducing testcases with simple structures, such as scripts generated by jsfunfuzz. Scripts from web pages are harder to reduce, since removing a line frequently introduces a syntax error. But with a few extra tricks, Lithium can be effective against real-world scripts. For example, when Google Maps triggered a JavaScript Engine assertion, I was able to reduce the 40KB of Google Maps code to five lines.

Making Firefox crash quickly

On Mac OS X, crashes are surprisingly slow: it takes the OS crash reporter about 40 seconds to generate a crash log for Firefox. I don't know a general way to bypass the OS crash reporter, but there are two easy cases. First, for crashes that are easy to anticipate at the code level, such as null dereferences, adding a conditional exit(3) should do the trick.

Second, as of Mac OS X 10.5, fatal assertions are treated as crashes. To make the OS treat fatal assertions as exits rather than crashes, edit the relevant assertion-failure function (JS_Assert or NanoAssertFail) to call "exit(3);" rather than "abort();". To make your debug build pick up this change, run "make -C js/src" from the objdir.

Finding the scripts

An initial run of Lithium should make it clear which external <script> tags are involved in triggering the bug. Convert them to inline scripts so they're no longer loaded over the Web.

You may find that one script calls document.write to include another script. Add this code to the script at the top to see what additional scripts are being included:

__noSuchMethod__

You can use SpiderMonkey's nonstandard __noSuchMethod__ feature to turn "no such method" errors into no-ops. This helps Lithium reduce object-oriented scripts by allowing it to remove entire methods even before their callers have been removed.

Pretty-printing JavaScript

You can use jsbeautifier.org or the decompiler built into SpiderMonkey to transform the script into a form that is friendlier to Lithium.

To trigger SpiderMonkey's decompiler, wrap the entire script in an anonymous function and use dump (in the browser) or print (in the shell).

The decompiler has two modes: toString creates one line per statement, while uneval creates one line per function declaration. You'll probably want to run Lithium at least once for each mode, since toString makes it easy to eliminate unnecessary expression-statements while uneval makes it easy to eliminate unnecessary function declarations.

Moving to the shell

As soon as the script seems like it isn't too entangled with the browser DOM, try to eliminate the remaining references to the browser-specific "window" and "document" objects. This should allow you to reproduce the bug in the standalone SpiderMonkey shell, which starts much faster than Firefox (milliseconds rather than seconds).

Note that to reproduce JIT bugs in the shell, you need to use the "-j" switch.

Finishing touches

Lithium may have left empty "if" or "for" blocks, which can almost always be removed. To make the remaining code as simple as possible, try replacing variables with their values and inlining functions. If the code is object-oriented or uses call/apply, this might require a little thinking, but it's usually straightforward.

I gave my new fuzzer a break from testing TraceMonkey by asking it to look for differences between SpiderMonkey and JavaScriptCore. I have listed them below, with SpiderMonkey output above JavaScriptCore output.

I have no idea how many of these are bugs (in SpiderMonkey or JavaScriptCore) and how many are ambiguous in the spec (intentionally or unintentionally).

Early error reporting

SpiderMonkey reports some errors at compile time that JavaScriptCore only reports at run time, if the code is actually hit. The difference is most obvious (and most likely to cause compatibility problems) if the code is skipped.

Making JavaScript faster is important for the future of computer security. Faster scripts will allow computationally intensive applications to move to the Web. As messy as the Web's security model is, it beats the most popular alternative, which is to give hundreds of native applications access to your files. Faster scripts will also allow large parts of Firefox to be written in JavaScript, a memory-safe programming language, rather than C++, a statically typed footgun.

Mozilla's ambitious TraceMonkey project adds a just-in-time compiler to Firefox's JavaScript engine, making many scripts 3 to 30 times faster. TraceMonkey takes a non-traditional approach to JIT compilation: instead of compiling a function at a time, it compiles only a path (such as the body of a loop) at a time. This makes it possible to optimize the native code based on the actual type of each variable, which is important for dynamic languages like JavaScript.

My existing JavaScript fuzzer, jsfunfuzz, found a decent number of crash and assertion bugs in early versions of TraceMonkey. I made several changes to jsfunfuzz to help it generate code to test the JIT infrastructure heavily. For example, it now generates mixed-type arrays in order to test how the JIT deals with unexpected type changes.

Andreas Gal commented that each fuzz-generated testcase saved him nearly a day of debugging: otherwise, he'd probably have to tease a testcase out of a misbehaving complex web page. Encouraged by his comment, I looked for additional ways to help the TraceMonkey team.

JIT correctness

Last month, I wrote a new fuzzer designed to find correctness bugs. It runs a randomly-generated script in two JavaScript engines (in this case, SpiderMonkey with and without the JIT) and complains if the output is different.

It quickly found 13 bugs where the JIT caused JavaScript code to produce incorrect results. These bugs range from obvious to obscure to evil.

It even found two security bugs that jsfunfuzz had missed. One was a crash that involved a combination of language features that jsfunfuzz doesn't test heavily. The other was an uninitialized-memory-read bug, which caused the output to be random when it should have been consistent. jsfunfuzz missed the bug because it ignores most output, but the new fuzzer interpreted it as a difference between non-JIT and JIT output and brought the bug to my attention.

JIT speed

I set up the new fuzzer to compare the time needed to execute scripts and complain whenever enabling the JIT made a script run more slowly. It measures speed by letting the script run for 500ms and reporting the number of loop iterations completed in that time.

It has also found 10 cases where the JIT makes scripts about 10% slower. Most of these minor slowdowns are due to "trace aborts", where a piece of JavaScript is not converted to native code and stays in the interpreter. Some trace aborts are due to bugs, while others are design decisions or cases for which conversion to native code simply hasn't been implemented yet.

There is some disagreement over which trace aborts are most likely to affect real web pages. I asked members of Mozilla's QA team to scan the web in a way that can answer this question.

Interpreter speed

Mostly for fun, I also looked to see which code the JIT speeds up the most. Here's a simplified version of its answer:

Assertions

The JavaScript engine team has documented many of their assumptions as assertions in the code. Many of these assertions make it easier to spot dangerous bugs, because the script generated by the fuzzer doesn't have to be clever enough to actually cause a crash, only strange enough to violate an assumption. This is similar to my experience with other parts of Gecko that use assertions well.

Other JavaScript engine assertions make it easier to find severe performance bugs. Without these assertions, I'd only find these bugs when I measure speed directly, which requires drastically slowing down the tests.

I should be able to find some performance bugs by looking at which aborts and side exits are taken. This strategy would make some performance bugs (such as repeatedly taking a side exit) easier to spot.

DOM 2 does not allow nodes to be moved between documents -- in fact, it requires that implementations throw an error when code tries to do so. But for years, Gecko has not enforced this rule.

It's a bit embarrassing that Internet Explorer gets this right and we get it wrong. Someone might think Gecko is trying to embrace and extend the DOM.

Soon, Gecko will start enforcing the rule on trunk. But bringing Gecko in line with this aspect of the DOM spec risks breaking Gecko-specific code, such as code in extensions and bookmarklets written for Firefox. For example, my Search Keys extension used to create some nodes in the chrome document, and some in the foreground tab, before putting them in the tab that that just loaded. Search Keys 0.8 creates all elements in the correct document.

I also updated the following bookmarklets to create nodes in the correct document and/or use importNode when copying nodes between documents:

These bookmarklets previously only worked in browsers that violated the DOM spec by allowing nodes to be moved between documents without a call to importNode or adoptNode. Maybe some of them work in IE now.

If you use those bookmarklets, you should grab the new versions so they won't break when you update to next week's trunk build or to Firefox 3.

A desperate web developer emailed me asking how make a bookmarklet that does something with the selected text, where the selected text is usually in a textarea.

He had tried using window.getSelection.toString(), but that doesn't work, because window.getSelection() is implemented in terms of DOM Ranges and it doesn't make much sense to have a DOM Range inside a textarea.

Here are some of the methods I tried:

Determine focus using by tracking with onfocus or onblur events. Works for web pages, but doesn't work for bookmarklets.

Hidden referrer column for history

Extensions can now access the referrer information for pages stored in
the browser history. This feature can be used to provide alternate history
views and other useful functionality. For example, my How'd I Get Here? extension uses this feature.

API for prioritizing HTTP connections

The Mozilla networking library now supports the prioritization of
connections to a specific server. (See nsISupportsPriority.)

API for managing user and UA stylesheets

Extensions can now register stylesheet URIs as additional user and UA
stylesheets. This means extensions no longer have to try to edit
userContent.css to add styling (say for XBL binding
attachment) to web pages. This makes it easier to implement extensions like
Flashblock. For details on using
this API, see
Using
the Stylesheet Service.

Site-specific user style sheet rules

Firefox now supports
site-specific
user style sheet rules. While advanced users can edit userContent.css to
use this feature directly, an extension could also take advantage of this feature
using the API for managing user style sheets above.

Dynamic Overlays

Loading of XUL overlays after the document has been displayed is now
supported. (See nsIDOMXULDocument.)

Translucent Windows (Windows/Linux)

On Windows and Linux, XUL windows with a transparent background are now
supported. This allows whatever is below the window to shine through the
window background.

New Preferences Bindings

These new
bindings make it easier to create preferences windows for extensions.
The new preferences windows support instant-apply behavior, which is
enabled by default on Mac and Linux.

API for implementing new command-line switches

XTF Support

The eXtensible Tag Framework allows adding support for new namespaces
using XPCOM components to Mozilla (written in JavaScript or C++). For
example, the Mozilla XForms Project uses XTF to
add support for the XForms namespace. See the
XTF Home Page.

Rich list box

Access to nsIEditor of textboxes

Firefox now has a supported method for getting the nsIEditor of textboxes and textareas, making it easier to implement features such as spell checking for web forms. For more information, see bug 303727 or nsIDOMNSEditableElement.

Extensions written using JavaScript now use XPCNativeWrappers by default, making it easier to write extensions that manipulate web content without introducing security holes.

Extensions can now specify that they are compatible only with specific versions of Firefox (e.g. Firefox 1.5.0.2). Most extensions that work with Firefox 1.5 should set their maxVersion as 1.5.0.*, indicating that they will work with future security releases (unless some of those releases contain API changes, which is unlikely and will cause them to be numbered e.g. Firefox 1.5.1). See this page for more information.