A reader mentioned that “It’s not about lines of code.” And he was right: Brevity alone is an unreliable way to judge a program. The problem is—quite simply—the existence of Golf. In any Turing Equivalent language, it is possible to construct programs that are very small and very difficult to understand.

Brevity is not our goal, but it is a side-effect of what happens when we write software well. These are my thoughts about the relationship between program size and readability. It’s my take on how you can look at one short program and dismiss it as Golf, and look at another and praise it as being succinct and expressive.

what really matters

So what really matters? What is it about well-written software that makes it succinct and elegant? Let’s look at an absurdly trivial snippet of code and talk about five different ways we could express its intent:

Twenty-six symbols, but fewer arrays. Comparing them, is the last one less readable than the first? Less “intuitive”?

Let’s get that word out of the way immediately. “Intuitive” is not some wonderful principle that guides us to creating great software. It means no more and no less than “familiar.” There’s value in doing the same things the same ways, but only when it’s the right thing to do in the first place. Learn your tools.

So which of these trivial examples is better? And what does that tell us about program size? Let’s look at the examples and what they tell us. The first example tells us that there is no relationship between the various expressions, none. The second, where we have ‘abstracted out’ the addtimestwo method, tells us that it is no coïncidence that the five expressions used the same formula: they are the same thing.

If that calculation was a piece of business logic, we might want to ensure that if you change one, the others all change as well. In the second example, they are on separate lines, suggesting that although the calculation is shared, the individual assignments are still decoupled from one another. It suggests to me that you might want to remove one, or move it elsewhere.

In some languages, we might have moved all of the assignments into their own block. By grouping them, we have indicated very strongly that the five assignments belong together. In an imperative language like Ruby, this signals that there is something ‘atomic’ about this, and that you don’t want anything else to happen until all five assignments have been mode.

The fourth and fifth examples are special. The fourth example says that the inputs to these calculations is already arranged in a particular structure. The fifth example says that they are arranged in a different structure, but we will use a particular unfold—zip—to get them into the form we want to do the calculating.

detour ahead

This is exactly where expressive languages shine, and simultaneously it is exactly where you can go badly wrong in a quest to achieve size for size’s sake. This is a trivial example, but please extrapolate the principle upwards to lambdas, methods, objects, hierarchies, modules, frameworks, and applications.

The Seasoned Schemer is the sequel to the phenomenal book The Little Schemer, a book that teaches recursion and first-class functions. The Seasoned Schemer dives into first-class functions and teaches you how to express relationships by composing and combining functions. It’s a powerful abstraction that every programmer should have in their toolbox.

The fourth and fifth examples are shorter than the first and second, but they are much worse if what they imply—a relationship between the inputs to the calculations—is contrived solely to shorten the code.

On the other hand, if that relationship actually does exist, then they are far, far superior to the other examples. If there is a relationship there, the most important thing, the clearest thing, is to express the relationship, to make the code we write clearly communicate the semantics of the data it manipulates.

Many people feel that regardless of the expressive power of their language, if it supports procedure calls, they’re good to go: they can abstract things into procedures, and bundle procedures into libraries, and with good design and decoupling and muttering yaddda yadda yadda while waving their hands just right, the result will be manageable, readable, abstracted code.

Ok, I’m pulling their legs. It isn’t wrong, it’s absolutely correct that you can get an enormous win out of abstracting procedures, just like example two. But although that is good, having other ways to organize code is better. Example two adds some overhead: you have to read addtimestwo and control-click the label to jump to the implementation (alert to IDE lovers: I do this kind of thing in Ruby as well, please don’t assume that the rest of us are scrabbling with Notepad.exe).

This abstraction adds overhead. “Abstracting” the common operation has made it more difficult to read, not less difficult to read. People for who consider meta-programming some sort of Black Magic often make this exact point: The mechanism for removing duplication adds complexity itself. One view is that the overall effect is only a win if the complexity added is small compared to the duplication removed.

Another view, the one I am promoting here, is that it isn’t about removing symbols, it’s about communicating something about the underlying relationships. Even if your language is so crufty that example two is longer than example one, it’s the right thing to do if the calculations are the same for a deep reason, if you are trying to communicate that they will always be the same.

That being said, my experience is that when a relationship exists, the code that clearly expresses it usually winds up being shorter than code that does not express it.

I said usually. And here is the pitch for more expressive languages. Languages make expressing certain things easier than others. Each language has its strengths and weaknesses in these areas. Have you noticed how many blog posts have been written explaining Monads?

The problem with languages is that if you are diligent, you will carefully organize your code to express all of the relationships that your language supports. So in Pascal, you factor out all of the common functions and procedures. And when you are done, you have maximally “compressed” your code. Not in the sense of creating an unreadable mess, but you have made your code as small and as elegant as you think is possible and prudent.

But there are two problems. Pick one:

You do know other languages, and you recognize the other relationships that Pascal does not express well. So you Greenspun OO into your code base, and you’re hacking away at first-class functions. Now you can express these other relationships, but your code base is extremely idiosyncratic, and in many cases the code is longer and more convoluted because you have to do weird backflips to make the language try to pretend to be Java or Factor.

You don’t know any other languages, so you think that you have done your best. You have no idea why other programmers are mumbling about Objects, or First-Class Functions, these weird things merely make the code an unreadable mess. You don’t see the underlying relationships those techniques express because you are a Pascal programmer.

Either way, Pascal places limits on how well you can express the underlying relationships between things in your programs. The difference between the two situations is the difference between blissfully moving dirt because “that’s what we do, we move dirt,” and swearing under your breath all day as you fight a losing battle against dirt taking over your construction site.

An “X programmer”, for any value of X, is a weak player. You have to cross-train to be a decent athlete these days. Programmers need to be fluent in multiple languages with fundamentally different “character” before they can make truly informed design decisions.

This is so wrong in so many ways, but let’s start with the obvious bug. If are a Junior Blub Programmer, you believe that this algorithm expresses the underlying relationship properly, so you just fix the bug. If you are a Senior Blub Programmer, you don’t just fix the bug, but you add meaningful variable names so that wrong code looks wrong:

Ah, but if you are a Blub Architect, you realize that asking programmers to use meaningful variable names to keep out of trouble won’t work. Why, if that’s all it took, we could use dynamically typed languages like Javascript! No, we must put the compiler to work so that it is impossible to make this mistake. In fact, that’s exactly how Ada works: There is a special type for “the index of a foo,” and it is different from the type for “the index of a bar,” so the buggy code wouldn’t even have compiled. You can do this with C++ as well, and let’s pretend you can do it with Blub:

The Junior, the Senior, and the Architect are all thinking within the limitations of Blub. They still want to write the same program, they don’t see that the index variable, the test for the end of the loop, the nesting of loops, and incrementing an index have nothing to do with the basic relationship you are trying to express.

Instead of asking themselves why they have index variables, (or better still, why they have nested loops), the Blub programmers are asking themselves how to minimize the problems index variables create.

People tend to view code bases much the way construction workers view dirt: they want great big machines that can move the dirt this way and that. There’s conservation of dirt at work: you can’t compress dirt, not much, so their solution set consists of various ways of shoveling the dirt around…

Index variables are dirt, and being a better Blub programmer is an exercise in learning more sophisticated ways to handle dirt. The Java Programmer has been liberated from this kind of thinking. She writes:

The authors of Beautiful Code share their thinking and problem solving process with us. And that’s why this book transcends so many other books about programming: thinking and problem solving approaches are universal. This isn’t a book about programs, it’s a book about programming, and with every chapter you read you will become a better programmer.

She is not concerned with ways to prevent mistakes with index variables, because Java gives her a tool to make them go away. While the Blub programmers are arguing about how to manage dirt, she is writing code that expresses the relationship she wishes to communicate.

Her code is shorter, yes. And much better. And the reason it is much better, the desirable property of her code, is that it communicates the actual thing going on and does not communicate a bunch of other things that are irrelevant.

Those extra moving parts, the yellow code, the accidental complexity makes the code harder to read. It adds opportunities for errors. It makes it harder to maintain. And all because it contains dirt, it contains things that do not express some fundamental relationship in the data.

Shorter code is more readable when it is shorter by dint of expressing the underlying relationship, without irrelevant details.

Alas, while everyone agrees with this statement, the tyranny of programming in Blub is that the Blub programmer does not know that his code contain irrelevant details: he thinks his code does express all of the necessary ideas in his algorithm, he considers shorter code “cryptic.”

thank god, his typewriter ribbon is running dry

Now we can see that although each language provides abstraction mechanisms, and lets you build new abstractions with the mechanism, new kinds of abstractions give us new ways to express relationships. These things can be abused, of course, but nothing can save you from this: If you don’t let your Architect play with Domain-Specific Languages, what is to stop them from configuring everything in your application with XML?

The goal is readable code that expresses the underlying relationships. When we are programming in our personal Blub, the only relationships we see are the ones our language affords. Thus, we are always tempted to think that are programs already are as readable as the underlying problem domain allows, and that efforts to make them shorter are circus tricks that make the code less readable.

And the only way to break out of Blub is to cross-train, to really learn new languages. Otherwise, how will you know if your code could be shorter in a good way?

Shorter code is better if it expresses underlying relationships, if it is shorter because it does not contain irrelevant cruft, if it does not contain workarounds and hacks to deal with limitations in the programming language. Of course, shorter code is not always better. If it is “shorter by coïncidence,” by exploiting language tricks that do not reflect underlying relationships in your programs, that is bad.

The spoiling occurred when you decided to frame the problem in terms of 10-15 scalars rather than a 2x5 matrix. The solution requires 2 primitive operations; no 13 commas, no 8 braces, no auxiliary functions, no seperate line.

The spoiling occurred when you decided to frame the problem in terms of 10-15 scalars rather than a 2x5 matrix. The solution requires 2 primitive operations; no 13 commas, no 8 braces, no auxiliary functions, no seperate line

This is a terrific sixth example showing a different relationship yet again.

And trust me, there are seventh, eighth, and many more. My point is that the right form is driven by the right underlying relationship. Treating these as a matrix only makes sense if there is a semantic relationship that is also modeled as a matrix.

If there is, that's the way to go. If there isn't, then using a matrix increases the semantic distance between code and model.

Even if they were in C++ instead of Blub, what on earth is wrong with ++i? That's not only perfectly good C++ code, it's BETTER C++ code. If you get in the habit of using i++ when a perfectly good ++i will do, you're also in the habit of making needless temporary value copies that just get thrown away half the time. It doesn't matter much with ints and so forth, but it tends to get uglier with complex types!

The only time that the post-increment i++ is "better" is when you really did need the previous value:

I posit that the first example makes for the best code. Why? It takes, BY FAR, the least amount of energy to read it and understand it. Is it concise, elegant, and DRY? No. Is it short? No. But since we all know code is read far more often than it is written, I much prefer the path of least resistance.

Purists would argue that we're creating an unnecessary variable on the stack in the second example AND increasing the code's length. I would argue very strongly that adding the extra (properly named) variable doesn't change the functionality, but reduces the energy needed to understand this code tremendously.

This isn't necessarily a language argument, but I would like to mention that Perl, C, and Ruby programmers tend to have an insatiable desire to reduce code length and increase "elegance." When wielded improperly, this REDUCES understandability (often under the guise of "expressiveness").

"The fourth and fifth examples are shorter, but they are much worse if what they imply—a relationship between the inputs to the calculations—is contrived solely to shorten the code. On the other hand, if that relationship actually does exist, then they are far, far superior to the other examples. If there is a relationship there, the most important thing, the clearest thing, is to express the relationship, to make the code we write clearly communicate the semantics of the data it manipulates."

..what the criteria for the existence of a relationship?

To me, the relationship does exist.. you put it in code. The question is really whether it is a relationship that is convenient in the sense that it makes it easy to tell a story about how you get to a solution.

I guess it's a bit of a philosophical point, but to me the best small programs are ones which help you see a simple path to a solution. I agree that's close to but not quite the same as size.

In most calculus textbooks, there are proofs for the included theorems. For many of these proofs there are shorter ways of expressing them, but the version used in the textbooks is generally known to be the easiest version to understand. Obviously there are longer versions as well.

If your looking for examples of simplicity and elegance, these proofs are a great place to start. They are 'code', they're rigorous, and they're neither the longest nor shortest version. They are elegance personified. A rare find, that is commonly available :-)

I know it's missing the point of the example(s), but I cannot resist: You're calculating the sum of the products of each element of a vector with each of the others. Using distributivity, that's the same as calculating the sum of the elements for each vector, and then multiplying those three values. In Haskell:

frubbish = sum foos * sum bars * sum bashes

This version is linear in the maximum length of the vectors. The original version is cubic in the maximum length of the vectors.

Moral: It doesn't really matter (within reason) which language, which variable names, or which loop-style one uses. It matters much more that you understand the algorithm on a higher abstraction level, and are able to think of better alternatives at that level. Of course, a terse but still readable notation can help with that, but in the end, translating that algorithm to whatever feature the language you use offers isn't that hard.

And in the second case, it's easier for me to read the first one, where no unneeded abstractions are introduced, than the "boolean authorized" version. Of course, depending on the context, the second version might be better - if e.g. "authorized" is actually a valid abstraction, conceptually or de facto (used more than once).

At my local climbing gym, they put up a 5.11 that traversed to the right, went up, and traversed back to the left.

Being tall, I found a way to dead point straight up, avoiding the traverses. The route setter saw me doing this, and got out the ladder. Fifteen minutes later, he had adjusted the holds so that the short cut was nearly impossible…

The original version of the Blub vs. Java example had the expression frubbish += foo[i] * bar[k] * bash[j]; in the loops. Thanks to an obvious property of arithmetic, the entire algorithm can be optimized so that its run time is O3n instead of On cubed.

Alas, that optimization renders the bugginess discussion moot. It still has extra moving parts in Blub that are not present in Java, and it is still far more elegant in languages like Python, but I think the argument is far more compelling with nested loops.

Thus, I have changed the example code so that we cannot take advantage of the special relationship between multiplication and summation.

Wow, I didn't know that difference in C++ between i++ and ++i, sadly it just increases my feeling that C++ has become so convoluted that it is nearly unusable.

Even without the haskell solution (by far the prettiest one), in most languages you could generate (using yield style iteration in those that support it, using object style iteration in others) an iterator that would generate all the triples of elements needed, then call the function on that :

which is likely to be useful in other places as well. It has the further advantage that you can change the order of generation of the triples - something that can help debugging if the result should be independent of the order but changing the order changes the result.