Advogato's Number: The Economics of Software Complexity

This week, Advogato takes a look at a familiar feature of the software
landscape, complexity. Why does it happen? Specifically, what are the
economic forces that drive it? How does free software differ from
proprietary in this regard?

Advogato has no formal training in economics, but has long been
interested in the approach, perhaps due to the influence of a beloved
uncle who
was economic advisor to a high-powered family business. It's always
seemed to me that you can use classic economics concepts to analyze
the practice of software, but few economists have dared to tread there
(a wave of the paw to Michael Masnick for pointing out an exception:
Alan
McAdams).

Most of us are familiar with the progression: a 1.0 release which is
fairly simple, but usually fairly sparse in functionality. If the
project is a failure, the 1.0 release is the last. Otherwise, even
though it's a success, the authors are never happy with it, and put
out a succession of new releases. And, predictably, each release is
larger and more complex than the previous.

This pattern is so routine that most of us take it for granted, both
in the proprietary and free software worlds. But does it have to be
this way? After all, when an author of a book finishes writing it, for
the most part it's done. Why is software so different? Why can't we
just start with version 6.0 and skip the hassle of all the preceding
versions?

I think the concept of investment sheds some light on the
question. The cost of undertaking a new software project is large, and
much more so for a version 6.0 than for a version 1.0. In fact,
complexity is probably the single best predictor of the cost of a
software project. Further, software projects are inherently very
risky. A fair number of them fail outright, and a much larger fraction
deliver, but disappoint. Thus, an initial investment in a lower
complexity project is a much lower risk. And since the success of
earlier versions is a fairly good predictor of the success of future
versions (second
system syndrome not withstanding), the larger investment required
is a more reasonable risk.

But why are the new versions always more complex? Shouldn't it be
possible to just make them better without necessarily
increasing the complexity? In theory, this sounds nice, but in
practice there always seems to be a reason to need more complexity.

Fred Brooks identified "accidental complexity" as one source in his
essay, "No Silver Bullet". I personally prefer the term "needless
complexity" to emphasize the idea that choices made by the authors do
have some effect on the complexity.

However, not all complexity is needless. Many of the problems solved
by software today are inherently fairly complex. In particular,
integrating with other programs is a major source of the complexity.
You want programs to integrate, otherwise you're much more
likely to face a situation where things Just Don't Work. And, as these
programs also grow in complexity themselves, the cost of the
integration goes up. If you don't track the changes, you face bit rot.

Even in the area of integration, there are some choices that can make
it easier or harder, such as paying attention early on to good
standards. Bad standards (almost by defition) are one of the main
sources of complexity.

Modern applications have generally moved from being command line based
to GUI, which is quite a bit more complex. What we see here is
conservation of misery - things get easier for users, but harder for
the developers.

So, coming back to economics, some complexity is necessary to deliver
software that meets the needs and desires of users, but some of the
complexity is needless. Yet, we see both types in abundance. If the
latter increases costs so much, why is it not rooted out at every
turn?

In the proprietary software world, I think the major reason is to
raise the barrier to competition. Since complexity is the major factor
in cost, by raising the complexity required to implement a certain set
of features, you make it much harder for your competitors. Since you
don't have control over the internal complexity of their software, you
just make the interfaces complex.

Ideally, you minimize the cost for yourself by adding the complexity
incrementally, ie treating the work you've already done as free. Even
though the total complexity of the next version is large, the relative
cost is quite a bit lower. This is a strong economic force in favor of
cruft.

Free software is better at resisting complexity when not needed, and
there are few better examples than the sockets API for networking. It
basically hasn't changed much since Bill Joy first implemented it in
BSD about 20 years ago. Yet, that simple API is what interfaces
virtually all applications to the Internet.

Not that there haven't been attempts to ratchet it up a version.
Winsock 1.0 was a fairly straightforward adaptation of the sockets api
for the Windows platform. So naturally there's now a Winsock 2.0 that
includes all kinds of really complex stuff for quality of service and
so on.

The sockets protocol has also gained competition from consortia. The
well-loved Open Group has been pushing XTI for some
time now. As far as I know, it still isn't implemented for Linux, and
if it weren't for its inclusion in W. Richard Stevens' books on Unix
Networking Programming, probably would be completely unknown.

For proprietary standards, there's usually a carrot and stick
approach: if you want to use this cool new feature, you have to put up
with this whole new API. But the free software world is pretty good at
adapting something that already works.

Even in the proprietary world, the carrot has to be sweeter than the
stick is sharp. How many people use Group 4 fax machines? Better yet,
how many people use them over ISDN connections? On the other hand,
companies such as Adobe are very skilled at taking a technology that
is simple enough to be implemented by anybody (the original
PostScript) and adding features (color, CJK font support, searching,
links) to make it a very successful standard. Even though PDF is an
open standard with relatively little intellectual property protection
(the LZW patent in particular), it is difficult for other people to
handle the entire beast. Thus, today Adobe dominates the PDF
marketplace with their Acrobat products.

Within free software, increasing the barrier to competition is not a
motivation for added complexity, but the issue of incremental
investment certainly is. A classic case is autoconf and make. This is
a system with a lot more complexity than is really needed, but it's
not hard to see how it got there. Instead of rethinking building from
scratch, the designers of autoconf probably said to themselves, "we've
already got a make tool, why reinvent the wheel?" So the incremental
complexity may have been lower, even though this had very negative
consequences for total complexity.

Standards bodies are also very bad about treating the complexity of
existing standards as zero. It is very inexpensive (for the writers of
the standard) to include a whole new specification by reference, even
though it might be extremely painful for implementors. SVG is an
extreme example of this source of complexity. The SVG specification
itself is not all that complex, but it includes by reference XML,
XMLns, XPath, XLink, XPointer, CSS2, XSLT, DOM, JavaScript, sRGB, ICC,
Panose, PNG, JPEG, gzip, and probably one or two others I missed.
Right, SMIL Animation.

The IETF, in its foresight, avoided this kind of problem by requiring
standards to be based on working, interoperable implementations. In
particular, they generally require two independent implementations,
which makes it much harder to sweep incremental complexity under the
rug by leveraging existing integration work. Thus, in the IETF, the
cost of complexity matches much more closely what it would be in the
real world. In Advogato's opinion, standards bodies such as the World
Wide Web consortium would do well to learn from this wisdom.

Most of the arguments I've put forth here seem just like common sense
to me. However, I haven't seen them clearly articulated anywhere else,
so I'll post them here and see what happens.

One could look at it this way: the issue is not so much why version 6.0 is so complex, but why the complexity increases. The
answer: because version 1.0 was so simple. Just like other engineering disciplines, you start with a proof-of-concept, then build a
prototype, and finally produce the real thing. The proof-of-concept is version 0.0, the prototype is 1.0, and from there it's a slow
progression to a bona fide final product.

The proof-of-concept is version 0.0, the prototype is 1.0, and from
there it's a slow progression to a bona fide final product.

But you DO get to the final product, or else you've just wasted
a
wad
of cash.

In engineering disciplines (at least from my years sloghing
around
in chemical production facilities), you usually have a very well
defined final goal, and vary very little from it during the construction
phase, then when it's done, it's done (sans some maintence).

I've never had the opportunity to design a production facility
to
produce HDA, and then decide to produce benzene, then car wax, then
Topps baseball cards....all from the same unit.

One might argue that there is a difference between software and
my
chemical plant example. However, is there really a difference? Both
were assembled to do a job (the unit makes goo, and the program pushes
electrons), but one doesn't change much after it's put into the
ground. Why should the other?

Does software need to grow more complex? In some cases, to
improve
end user functionality, as Advogato suggests, is necessary (keeping the
parallel comparison, the push in the chemical industry to go from
continuous->batch to meet more dynamic demand). However in many
cases
(like I learned when mucking with SVG and playing with parts of Excel
2000/W2K), that I feel people have forgotten the KISS rule.

What I would like to see is more integration and cooperation
between
software to perform a task, instead of having one big MegaApp hogging
resources and boggling my mind...

I can think of 2 things contributing to this problem. The first is a
broad cultural issue, which is that everyone thinks of computers as sort
of embodying scientific growth and thus not confined to any individual
problem domain. So writing software is sort of considered an ongoing
progression, rather than a simple satisfaction of a set of goals.

But I think there's another issue, which is the tradeoff between
generality and simplicity. At the core of most technical flamewars is a
small disagreement over where on the spectrum of generality and
simplicity a program should go, and a lot of time spent with a customer
or in a design meeting is delineating exactly what level of abstraction
to tackle a problem at. To a large extent I think that the progress
through versions is a sort of gradual creep towards generality, even
within a well defined problem domain. Everyone recognizes that general
solutions are "better" in a completely abstract sense because they are
capable of handling more problems with fewer specialized cases, but
overgeneralization quite simply kills a program before it ever
sees the light of day.

So when you release 1.0, you have frequently specialized it a
lot in order to make it on your budget, whether that budget is
money or time or just your own intrest in solving a local problem. You
know it doesn't generally solve the whole problem domain. Even
if by some miracle it does, it's probably specialized internally, and
there's some factoring and tidying you can do. That's why you go back at
it. To make it the more perfect, abstract, elegant, efficient, minimal
piece of code you set out to make. It doesn't always get smaller, but
sometimes it does. I've had it happen that 2.0 is half the size and
twice the speed of 1.0, because of a key insight we found after
1.0 came out.

I also feel the author's comments here not quite right about books:
authors do release second and third editions, and they do
change things inbetween them. More often in technical volumes, reference
works, etc. A new encyclopedia comes out now and then. A new medical
reference. A new textbook. Even my programming books are Nth edition for
N > 1. Fiction occasionally gets re-released with new chapters too, a
bit of a tidy-up, more author's notes, an epilogue, a sequel, etc.
Likewise musicians and even film makers re-release improvements:
remixes, live recordings, extra studio sessions, director's cuts, etc.