How Many Eyeballs Tame Complexity

It's one thing to observe in the large that the bazaar style
greatly accelerates debugging and code evolution. It's another to
understand exactly how and why it does so at the micro-level of
day-to-day developer and tester behavior. In this section (written
three years after the original paper, using insights by developers
who read it and re-examined their own behavior) we'll take a
hard look at the actual mechanisms. Non-technically inclined
readers can safely skip to the next section.

One key to understanding is to realize exactly why it is that
the kind of bug report non–source-aware users normally turn in
tends not to be very useful. Non–source-aware users tend to report
only surface symptoms; they take their environment for granted, so
they (a) omit critical background data, and (b) seldom include a
reliable recipe for reproducing the bug.

The underlying problem here is a mismatch between the tester's
and the developer's mental models of the program; the tester, on the
outside looking in, and the developer on the inside looking out.
In closed-source development they're both stuck in these roles, and
tend to talk past each other and find each other deeply frustrating.

Open-source development breaks this bind, making it far easier
for tester and developer to develop a shared representation grounded
in the actual source code and to communicate effectively about it.
Practically, there is a huge difference in leverage for the developer
between the kind of bug report that just reports externally-visible
symptoms and the kind that hooks directly to the developer's
source-code–based mental representation of the program.

Most bugs, most of the time, are easily nailed given even an
incomplete but suggestive characterization of their error conditions
at source-code level. When someone among your beta-testers can point
out, "there's a boundary problem in line nnn", or even just "under
conditions X, Y, and Z, this variable rolls over", a quick look at the
offending code often suffices to pin down the exact mode of failure
and generate a fix.

Thus, source-code awareness by both parties greatly enhances
both good communication and the synergy between what a beta-tester
reports and what the core developer(s) know. In turn, this means that
the core developers' time tends to be well conserved, even with many
collaborators.

Another characteristic of the open-source method that conserves
developer time is the communication structure of typical open-source
projects. Above I used the term "core developer"; this reflects a
distinction between the project core (typically quite small; a single
core developer is common, and one to three is typical) and the project
halo of beta-testers and available contributors (which often numbers
in the hundreds).

The fundamental problem that traditional software-development
organization addresses is Brook's Law: ``Adding more programmers to a
late project makes it later.'' More generally, Brooks's Law predicts
that the complexity and communication costs of a project rise with the
square of the number of developers, while work done only rises
linearly.

Brooks's Law is founded on experience that bugs tend strongly to
cluster at the interfaces between code written by different people,
and that communications/coordination overhead on a project tends to
rise with the number of interfaces between human beings. Thus,
problems scale with the number of communications paths between
developers, which scales as the square of the humber of developers
(more precisely, according to the formula N*(N - 1)/2 where N is the
number of developers).

The Brooks's Law analysis (and the resulting fear of large
numbers in development groups) rests on a hidden assummption: that the
communications structure of the project is necessarily a complete
graph, that everybody talks to everybody else. But on open-source
projects, the halo developers work on what are in effect separable
parallel subtasks and interact with each other very little; code
changes and bug reports stream through the core group, and only
within that small core group do we pay the full
Brooksian overhead. [SU]

There are are still more reasons that source-code–level bug
reporting tends to be very efficient. They center around the fact
that a single error can often have multiple possible symptoms,
manifesting differently depending on details of the user's usage
pattern and environment. Such errors tend to be exactly the sort of
complex and subtle bugs (such as dynamic-memory-management errors or
nondeterministic interrupt-window artifacts) that are hardest to
reproduce at will or to pin down by static analysis, and which do the
most to create long-term problems in software.

A tester who sends in a tentative source-code–level
characterization of such a multi-symptom bug (e.g. "It looks to me
like there's a window in the signal handling near line 1250" or "Where
are you zeroing that buffer?") may give a developer, otherwise too
close to the code to see it, the critical clue to a half-dozen
disparate symptoms. In cases like this, it may be hard or even
impossible to know which externally-visible misbehaviour was caused by
precisely which bug—but with frequent releases, it's unnecessary to
know. Other collaborators will be likely to find out quickly whether
their bug has been fixed or not. In many cases, source-level bug
reports will cause misbehaviours to drop out without ever having been
attributed to any specific fix.

Complex multi-symptom errors also tend to have multiple trace
paths from surface symptoms back to the actual bug. Which of the
trace paths a given developer or tester can chase may depend on
subtleties of that person's environment, and may well change in a
not obviously deterministic way over time. In effect, each developer
and tester samples a semi-random set of the program's state space when
looking for the etiology of a symptom. The more subtle and complex the
bug, the less likely that skill will be able to guarantee the
relevance of that sample.

For simple and easily reproducible bugs, then, the accent will
be on the "semi" rather than the "random"; debugging skill and
intimacy with the code and its architecture will matter a lot. But
for complex bugs, the accent will be on the "random". Under these
circumstances many people running traces will be much more effective
than a few people running traces sequentially—even if the few have
a much higher average skill level.

This effect will be greatly amplified if the difficulty of
following trace paths from different surface symptoms back to a bug
varies significantly in a way that can't be predicted by looking at
the symptoms. A single developer sampling those paths sequentially
will be as likely to pick a difficult trace path on the first try as
an easy one. On the other hand, suppose many people are trying trace
paths in parallel while doing rapid releases. Then it is likely one
of them will find the easiest path immediately, and nail the bug in a
much shorter time. The project maintainer will see that, ship a new
release, and the other people running traces on the same bug will be
able to stop before having spent too much time on their more difficult
traces [RJ].