Avoiding Downstream Eddies in Free Software

Greg defined the "Linux ecosystem" as a series of interconnected
projects, primarily the Linux kernel, GCC, X.org, binutils, glibc, and the
Linux man pages. Though a GNU/Linux distribution includes much more
software, though parts of this ecosystem are available on other free
Unix-like systems (including the *BSDs and Open Solaris), and though you
can remove X.org on many embedded devices, this is the minimal group unique
to all GNU/Linux-based systems. Thus it's the important and useful
infrastructure common to anything readily identifiable as a Linux-based
system. Any distinguishing flavor of Linuxness comes from this combination
of projects.

Who Contributes to the Linux Ecosystem?

The Linux Foundation published a report earlier this year about contributions to the Linux kernel (written by Greg, Jon Corbet from
LWN, and Amanda McPhearson from the Linux Foundation). The study
demonstrates both the tremendous rate of change in and contribution to the
Linux kernel.

Greg had updated numbers in his keynote. The Linux kernel has grown by
99324 patches in the past three years, from 2.6.15 to 2.6.27 -- that's
almost a hundred changes every day. The contribution-tracking heuristics
credits Red Hat with 11846 patches, Novell with 7222, Mandriva with 237,
and Gentoo with 229.

The largest group of contributors to the Linux kernel is people not
aligned with any particular business or company. They produced 17% of all
tracked patches. 8.3% of all patches came from people with unknown
alignment. Together that represents a quarter of all contributions to the
Linux kernel which apparently come from amateurs.

37% of the contributions to GCC (by the same metrics) are from
amateurs.

Red Hat produces 26.8% of the patches to X.org, with 18.8% coming from
users of unknown affiliation, 12% from Intel, and 2.1% from the NSA. Note
that Greg mentioned that tracking X.org patches is difficult because lead
developer Keith Packard commits changes from several separate machines;
collating his contributions is difficult. (Keith works for Intel.)

Is It the Size of the Contribution, Or...?

The statistics are interesting, but it's difficult to draw meaningful
conclusions from them for two reasons. First, there are too many questions
about contributor affiliations to draw definitive data. Though O'Reilly
pays me in part to be a subject matter expert on F/OSS, my role here is as
an editor and writer, not a programmer. If I submit a patch to a free
software project inspired by my work duties, should O'Reilly get corporate
credit for my work? If I submit a patch outside of work, who should receive
credit? What would happen if another company hired me to continue working
on a free software project I participate in as a hobby?

Second, Cano
nical CTO Matt Zimmerman disagrees with the report's statistical
conclusions. Red Hat is a large, well-established, and profitable
company which can afford to hire and fund many developers. If Red Hat
weren't producing as many patches, the community might rightly
question the company's commitment to free software and the ecosystem. Yet
I'm not sure it's possible to produce a meaningful metric of how much any
existing company should contribute.

Users Who Don't Provide Feedback are Useless

Greg made a throwaway comment containing a point too important to get
lost in arguments over statistics and sampling. Contributing to the
health of common infrastructure is a primary duty of downstream
parties.

Red Hat, Canonical, Slackware, Novell, Debian, Mandriva, IBM, Google,
Dell, HP, Montavista, and Gentoo all benefit from the timely,
well-maintained, and featureful development from thousands of upstream
projects. In return, these groups make the work of these projects available
to millions of eager users. More users tends to mean more bug reports, more
feature requests, and, above all, more feedback -- which is the primary
benefit to upstream developers. Most of all, we want answers to simple
questions. Does it work? Is it useful? What more can we do to delight
you?

The best possibility is to receive a patch containing documentation,
well-designed and well-implemented code, and appropriate tests for a bug or
new feature. If it applies cleanly, builds and passes tests on the relevant
systems, and fits with the project's goals, even better.

Downstream Pseudo-forks

Yet even just knowing that a recent commit caused a test failure -- and
getting debugging information from the relevant system -- is valuable.
Canonical's kernel team may not have the expertise to diagnose and debug
errors in the kernel's SCSI subsystem related to the use of a particular
flag in combination with a new chipset present in the latest revision of a
hard drive controller. (Few people do.) Yet with the greater userbase of
Ubuntu, it's likely that Canonical's kernel team may receive such a bug
report while the Linux kernel developers may not.

The process only works when Canonical's kernel team reports that bug and
all appropriate debugging information upstream. Submitting patches upstream
would be great, but a well-produced bug report from experienced developers
and troubleshooters is likely sufficient information for upstream to find
and fix the bug.

Not all bugs are worth reporting upstream, of course. It's difficult to
fix unreproduceable bugs, or bugs without debugging information. As well,
distributors rarely distribute upstream's most recent code, so some fixes
may be as simple as backporting patches from newer versions.

When information flows only one way, the result is a fork in everything
but name. Bugs get reported to the distribution's tracker, patches get
applied to the distribution's version, and users use the distribution's
packaged version while believing (based on the name) that they're using
upstream's version. Though bugs and feature requests often get reported to
the distribution, upstream may get blamed for the problems.

Sadly, the accepted wisdom of the Perl community, at least, is to build
and install a custom version of Perl alongside the distribution's version.
Users who know this have to maintain multiple Perl installations, while
users who do not know this yet have to suffer potential downstream
misconfigurations. It's not clear what value distributions provide in these
scenarios.

Most free software licences allow this, and there are few legal
mechanisms to enforce such behavior (though Mozilla's trademark dustup with
Debian is an interesting potential counterexample). Even so, the
pragmatic arguments for maintaining regular contact with upstream are
strong. It's better for the users. It pushes the responsibility for making
good decisions to the most experienced people. It preserves the feedback
cycle which is so important to successful community-driven development.

Sustainable Upstream Development

Distributions provide valuable services in packaging, distribution, and
service -- this is especially true when integrating thousands of upstream
projects into a coherent, unified whole. It's no wonder that comparatively
few users follow any project's releases compared to the packaged versions
available for upgrade every few months.

Even as upstream often depends on downstream to make software available
to millions of users, downstream depends on upstream to produce
high-quality software which millions of users need, want, and use. Unlike
middlemen who seemingly exist to get a cut of the markup difference between
wholesale and retail prices, the separation of upstream and downstream
often provides tangible advantages for all groups concerned.

Yet that separation cannot be complete, nor can the flow of information
be unidirectional. Without users, a project need not exist. Without
feedback from users, a project might as well not exist. Without credit, or
distribution, or a steady stream of new developer interest, a project may
well wither and vanish.

Concentrating on the amount of contribution from downstream to
upstream misses a much more important point. The number of patch
contributions upstreams doesn't matter. The number of potential
contributions of any kind which remain in downstream eddies
matters. That number should be zero.