Oracle Blog

Views on software from Bryan Cantrill's deck chair

On Dreaming in Code

As I noted previously, I recently gave a Tech Talk at Google on DTrace. When I gave the talk, I was in the middle of reading Scott Rosenberg's Dreaming in Code, and (for whatever poorly thought-out reason) I elected to use my dissatisfaction with the book as an entree into DTrace. I think that my thinking here was to use what I view to be Rosenberg's limited understanding of software as a segue into the more general difficulty of understanding running software -- with that of course being the problem that DTrace was designed to solve.

However tortured that path of reasoning may sound now, it was much worse in the actual presentation -- and in terms of Dreaming in Code, my introduction comes across as a kind of gangland slaying of Rosenberg's work. When I saw the video, I just cringed, noted with relief that at least the butchery was finished by five minutes in, and hoped that viewers would remember the meat of the presentation rather than the sloppy hors d'oeuvres.

I was stupidly naive, of course -- for as soon as so much as one blogging observer noted the connection between my presentation and the book, Google-alerted egosurfing would no doubt take over and I'd be completely busted. And that is more or less exactly what happened, with Rosenberg himself calling me out on my introduction, complaining (rightly) that I had slagged his book without providing much substance. Rosenberg was particularly perplexed because he felt that he and I are concluding essentially the same thing. But this is not quite the case: the conclusion (if not mantra) of his book is that "software is hard", while the point I was making in the Google talk is not so much that developing software is hard, but rather that software itself is wholly different -- that it is (uniquely) a confluence between information and machine. (And, in particular, that software poses unique observability challenges.)

This sounds like a really pedantic difference -- especially given that Rosenberg (in both book and blog entry) does make some attempt to show the uniqueness of software -- but the difference is significant: instead of beginning with software's information/machine duality, and from there exploring the difficulty of developing it, Rosenberg's work has at its core the difficulty itself. That is, Rosenberg uses the difficulty of developing software as the lens to understand all of software. And amazingly enough, if you insist on looking at the world through a bucket of shit, the world starts to look an awful lot like a bucket of shit: by the end of the book, Rosenberg has left the lay reader with the sense that we are careering towards some sort of software heat-death, after which meetings-about-meetings and stickies-on-whiteboards will prevent all future progress.

It's not clear if this is the course that Rosenberg intended to set at the outset; in his Author's Note, he gives us his initial bearings:

Why is good software so hard to make? Since no one seems to have a definitive answer even now, at the start of the twenty-first century, fifty years deep into the computer era, I offer by way of exploration the tale of the making of one piece of software.

The problem is that the "tale" that he tells is that of the OSAF's Chandler, a project plagued by metastasized confusion. In fact, the project is such a wreck that it gives rise to a natural question: did Rosenberg pick a doomed project because he was convinced at the outset that developing software was impossible, and he wanted to be sure to write about a project that wouldn't hang the jury? Or did his views of the impossibility of developing software come about as a result of his being trapped on such a reeking vessel? On the one hand, it seems unimaginable that Rosenberg would deliberately book his passage on the Mobro 4000, but on the other, it's hard to imagine how a careful search would have yielded a better candidate to show just how bad software development can get: PC-era zillionaire with too many ideas funds on-the-beach programmers to change the world with software that has no revenue model. Yikes -- call me when you ship something...

By the middle of the book it is clear that the Chandler garbage barge is adrift and listing, and Rosenberg, who badly wants the fate of Chandler to be a natural consequence of software and not merely a tale of bad ideas and poor execution, begins to mark time by trying to find the most general possible reasons for this failure. And it is this quest that led to my claim in the Google video that he was "hoodwinked by every long-known crank" in software, a claim that Rosenberg objects to, noting that in the video I provided just one example (Alan Kay). But this claim I stand by as made -- and I further claim that the most important contribution of Dreaming in Code may be its unparalleled Tour de Crank, including not just the Crank Holy Trinity of Minsky/Kurzweil/Joy, but many of the lesser known crazy relations that we in computer science have carefully kept locked in the cellar. Now, to be fair, Rosenberg often dismisses them (and Kapor put himself squarely in the anti-Kurzweil camp with his 2029 bet), but he dismisses them not nearly often enough or swiftly enough -- and by just presenting them, he grants many of them more than their due in terms of legitimacy.

So how would a change in Rosenberg's perspective have yielded a different work? For one, Rosenberg misses what is perhaps the most profound ramification of the information/machine duality: that software -- unlike everything else that we build -- can achieve a timeless and absolute perfection. To be fair, Rosenberg comes within sight of this truth -- but only for a moment: on page 336 he opens a section with "When software is, one way or another, done, it can have astonishing longevity." But just as quickly as hopes are raised, they are dashed to the ground again; he follows up that promising sentence with "Though it may take forever to build a good program, a good program will sometimes last almost as long." Damn -- so close, yet so far away. If "almost as long" were replaced with "in perpetuity", he might have been struck by the larger fallacies in his own reasoning around the intractable fallibility of software.

And software's ability to achieve perfection is indeed striking, for it makes software more like math than like traditional engineering domains -- after all, long after the Lighthouse at Alexandria crumbled, Euclid's greatest common divisor algorithm is showing no signs of wearing out. Why is this important? Because once software achieves something approximating perfection (and a surprising amount of it does), it sediments into the information infrastructure: the abstractions defined by the software become the bedrock that future generations may build upon. In this way, each generation of software engineer operates at a higher level of abstraction than the one that came before it, exerting less effort to do more. These sedimenting abstractions also (perhaps paradoxically) allow new dimensions of innovation deeper in the stack: with the abstractions defined and the constraints established, one can innovate underneath them -- provided one can do so in a way that is absolutely reliable (after all, this is to be bedrock), and with a sufficient improvement in terms of economics to merit the risk. Innovating in such a way poses hard problems that demand a much more disciplined and creative engineer than the software sludge that typified the PC revolution, so if Rosenberg had wanted to find software engineers who know how to deliver rock-solid highly-innovative software, he should have gone to those software engineers who provide the software bedrock: the databases, the operating systems, the virtual machines -- and increasingly the software that is made available as a service. There he still would have found problems to be sure (software still is, after all, hard), but he would have come away with a much more nuanced (and more accurate) view of both the state-of-the-art of software development -- and of the future of software itself.

Many cranks have many ideas. Indeed, that's often the point: they have some number of ideas that come to fruition, which they use as an aura of plausibility for their many ideas that do not necessarily come to fruition. Now, one man's crank may be another man's visionary -- but I for one am a bit of a reality addict, and there are only so many trips to Pretendland that I can personally stomach before I start asking tough questions. But if these cranks are your visionaries, kindly accept my apologies...

Let me just add that -- while yes, I do use standing searches to keep up with what the blogosphere is saying about my work, and I consider that responsible authorial behavior, not egotism -- it wasn't "Google-alerted egosurfing" that led me to stumble on the video. The blogger who posted about it is a former colleague of mine from Salon, and I read her blog to keep up with her, not to seek stray mentions of my book.

(A longer response is certainly merited here, but that will have to wait for another day.)

Also, Scott: the alert-assisted surfing was not met as a slight -- I myself do it, and so do most other technologists that give a damn about their work. How did you think I found your blog entry so quickly, anyway? ;)

Solaris of course has lots of bigger, more complicated examples. Now on the one hand, one wants to refrain from pointing to thousands of lines of code and saying that there are no bugs therein, but on the other, there are many subsystems that have been in place and in heavy use for years without defect or modification. At the risk of being egocentric, the cyclic subsystem (which is executed at least 100 times per second on every Solaris system) had its last substantial fix over six years ago, and its last fix of any flavor over three years ago:

Modesty (and the lack, of course, of a proof of its correctness) prevents me from calling the cyclic subsystem perfect -- but such as unknown defects remain, there are damn few of them, and we can say that they must be a result of highly usual (or at least, heretofore unseen) circumstances.

A non-Solaris example -- and one that I've been known to use as the canonical example of the persistence of software -- is Super Mario Kart. This is a game that was developed (to its completion) fifteen years ago for the Super Nintendo console. Source code, to the best of my knowledge, is not publicly available and may indeed be lost -- but the binaries persist and (if my coworkers are any indication) remain in active use. Given the longevity of, say, Homer's Odyssey, there is reason to believe that Super Mario Kart will survive in perpetuity -- that thousands of years from now, twenty-somethings somewhere will be using the software exactly as it is used today. Is this perfection? Perhaps not -- but it also might not be discernible from perfection...

i find following two comments very telling:
[directly taken from referenced source file]

1067 \* On an SS2, this implementation of hrt2ts() takes 1.7 usec, versus about
1068 \* 35 usec for software division -- about 20 times faster.

[anybody arround here that still uses his SS2?]

the second comment is a few lines down (tv2hrt):
1131 \* If this ever becomes performance critical (ha!), we can borrow the
1132 \* code from ts2hrt(), above, to multiply tv_sec by 1,000,000 and the
1133 \* straightforward (x << 10) - (x << 5) + (x << 3) to multiply tv_usec by
1134 \* 1,000. For now, we'll opt for readability (besides, the compiler does
1135 \* a passable job of optimizing constant multiplication into shifts and adds).

very interestingly they code these shifts by hand in ts2hrt() ..

.. well could be that multiplication might be
faster than shifting and adding a few values on
a current architecture [niagara anyone?] ..

If that modulus code doesn't make it into the next edition of Hacker's Delight it will be a grievous oversight. I think Hank Warren discusses the factorization trick for De Bruijn multipliers in the section about fast ways to compute the number of leading or trailing zeros in an integer, but I've never seen such a clear explanation of the different bit-shifting techniques as Jeff wrote in hrt2ts(). So is the next version of the Solaris C compiler going to include an integer factorization algorithm tucked away in the optimizer for large constant multiplications, or will there still be room for human cleverness?

I have many words to describe hrt2ts(), and most of them are flattering, but "perfect" is definitely not one of them.

Optimizations like that are great when they give a 2000% performance benefit, but not so great when the benefit is two orders of magnitude smaller. At some point the time/space tradeoff kicks in and the straight C div/mod operator version is better.

On my hardware (Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz) hrt2ts() runs in 15ns while the equivalent function using C div/mod operators runs in 41ns. That's a 63% saving, with a larger code size and pages of commentary required to explain to a maintainer what's going on in the code.

You only see a benefit as big as that if you're calling this function a billion times in a tight loop (like I am to measure its performance). Without loop unrolling hrt2ts() slows down to 36ms per call while the C div/mod operators slow down to 51ns per call (making hrt2ts() only 28% better).

I wouldn't be surprised if hrt2ts() is still faster on a modern 64-bit architecture, but I'm guessing it's not even 20% faster.

I was (obviously) extremely impressed with that tech talk and with you and sun/solaris in general. I was a little shocked that you began that talk with the bashing you speak of, and wondered if it was because you were a young man that you were so rash. But then it seemed to me as if you were venting (in jest) because of all the headaches Dtraced brought. Also you started to swear alot, so it was all in good fun. Now I come to your weblog to find you expressing yourself clearly on the matter and offering your apology for being so negative. You sir, are awesome, your blog, bookmarked.