Summary
Part I - Beauty and The Beast. I begin my exploration of ugliness in code with some background and an overview of what most people consider to be crappy code.

Advertisement

90% of everything is crap. - Sturgeon’s Law (one of many variants)

When applied to software, Sturgeon’s Law is hopelessly optimistic. - Savoia’s Corollary to Sturgeon’s Law

O’Reilly just released a book called: Beautiful Code – Leading Programmers Explain How They Think. And, while I am probably not worthy of having my words and code share the same book spine as the work of CS legends such as Brian Kernighan or Jon Bentley, I am quite proud to have contributed a chapter entitled “Beautiful Tests.”

You should definitely get a copy of Beautiful Code; it’s 500+ pages of great insights into the thinking process of some great programmers (plus me). And, on top of that, all the royalties from the book will go to Amnesty International.

But this blog is not about beautiful code; it’s about its evil twin: nasty, ugly, convoluted, and hard-to-understand code. It’s usually the kind of code that other people have written – and you inherit because they left the company. It’s the kind of code that elicits a common reaction when it shows up on your IDE’s doorstep: “Crap!” – pardon my French. Or: “Merde!” – pardon my English. It’s the kind of code that, in some parts of the world, gets you fired – not the lame HR firings we have in the US, but the ones that involve a firing squad and bullets, and rescue calls to Amnesty International.

Hell is other people. - J. P. Sartre

Hell is other people’s code. - T-Shirt seen recently in Mountain View, CA

While I was writing my chapter of Beautiful Code, and reviewing some of the other chapters, I was forced to think beyond code functionality and efficiency. I had to think about beauty in code. And thinking about what makes a particular piece of code beautiful also made me think about the complementary question: What makes a particular piece of code crappy?

In the last few months, this question has led to heated conversations with colleagues, friends in the business, and random software engineers sitting next to me at the local coffee house. It would be naïve to expect programmers to unanimously agree on anything, but on the subject of crappy code most people I talked with agreed on a few of things:

First. There’s a lot of ugly code out there. Most of the underlying code in the software that runs our world is at the opposite end of beautiful. It's important to note that bad code does not necessarily imply crappy, useless, software applications. There’s a big difference between software and code. Software is the finished working product; it’s what the end user sees and experiences. Code is the guts of software; it’s what the programmer sees and experiences. It’s not only possible to have very useful software that’s built on crappy code – it’s the norm.

Second. Most programmers have a pretty high tolerance for working with ugly code – as long as they were the ones who wrote it. That’s either because they think their code is beautiful – in the same way every parent thinks their children are beautiful, or because it’s ugly in a way that they are familiar with. But their tolerance level drops dramatically when they have to deal with someone else’s code: Crap is other programmers’ code.

Third. Most software organizations don’t seem to care much about how bad the underlying code is – as long as the software does what it’s supposed to and the people who wrote it stick around to maintain it. If they did care they would have better measures to prevent ugly code from contaminating key components of their precious applications. A quick glance at the code in most software will show you that that’s not the case. There’s crappy code everywhere.

At some point or other, however, some key developers decide to leave the company – often because they can no longer stand to work with the mess of code they have created. That’s when bad code rears its ugly head and has to be dealt with. That’s when the search for victims to inherit the crappy code base begins.

Which leads us to the big question:

Most people agree that there is a lot of ugly code out there, but most developers are not able to see their own code as bad. Can we come up with an objective and easy-to-calculate measure of code crappiness, based on unassailable research and experimentation, that everyone could agree on?

The answer – unless you are hopelessly optimistic and painfully naïve – is a definite no.

But we can come up with a somewhat-objective measure of code crappiness based on previous research, some hard-earned experience, and, yes, some personal beliefs and preferences.

But this blog entry is already a bit too long so you’ll have to wait a couple of days for that. In the meantime, you might want to think how you would go about measuring code crappiness?

Summary of Part I(for people with really bad short-term memory)

Writing for the book Beautiful Code, made me think about the opposite of beautiful code – ugly code. By talking to a bunch of software people I have come to the conclusion that most developers associate crappy code with other people’s code – especially other people’s code that they have inherited and will have to maintain and work with. I have also come to the conclusion that – as long as the software works – most software organizations don’t really care about how bad their own code is … until the programmers who created it are no longer around. And by then it’s too late.

Preview of Part II(for people who are wondering if it’s worth coming back for more abuse)

By now, I have probably offended, insulted, or antagonized most of the readers – and this is only Part I. In Part II things are going to get even more fun because I will introduce a new software metric: the C.R.A.P. index. Yes, you heard me right. This is just what the software industry needs: another metric.

C.R.A.P. stands for Change Risk Analysis and Predictions. As the (mildly offensive, but hopefully memorable) name implies, the C.R.A.P. index is designed to analyze and predict the risk and effort associated in changing (maintaining and enhancing) an existing body of code – particularly (but not necessarily) by developers other than the original developers. It should be fun, hope to see you back for Part II.

The article is quite interesting...I can't just wait for the Part II... just curious if it is somehow possible to get a metric for "crappy" code!

I think it is somehow curious how the software world seems to strike a balance between ever more "rigorous" mathematical approaches to productivity measurement (such as the metric that you mention, for example) and approaches that make the creation at least of the first working prototypes of a software easier by relaxing the mere coding act (look at how many entries are present on this website on static type checking against run-time dynamic type checking for example...)

I'd like anyway to add my 2 cents on the "crappy" code thematic:

I'd like to put emphasys on how much the human factor plays a role there: my personal experience for instance was that, while surprisingly (or not ?) the most of the programmers agree on the quality of a piece of code when they first look at it (it is great! it is crap!), quite no one is so open to recognize that when he wrote not excellent code (not necessarily crap, in reality...) a rewrite or at least a refactoring and restructuring would be great: I think that this kind of maturity (the kind of "martial arts" humbleness, as someone said in a book on programming, BtW) is something that you get with the years, and when you start feeling that your (now demonstrated) qualification is not put under discussion by being able to accept a constructive critic (at the opposite!)...

to me, it happend at the end of a project at the customer site, when I had to go through the "hand-over" phase of what I created to my successor (one of the employees at the customer site):despite of what "lonely hard-coders!!" could probably think, this was in reality a great experience: the 5 days hand-over phase (review of the code, restructuring of the classes and reformatting of the code, and so on and so on...) was something FUNNY EVEN FOR ME, the original programmer (admittedly, coding the stuff first was even more funny, but the feeling at the end of the process of having created something that moved from "working" but with reasonable quality to "working" and with beautyful structure was really self-satisfying!!)

As you say, crap comes in many shapes and fragrances. A flavor that I find particularly offensive is code that does nothing. Code that solves a problem that needn't be solved, that wastes CPU cycles and developer brain cycles, occupies pages of listings and is unnecessary. This ranges from the brain dead (iterate across a HashMap and null every reference), through the moronic (listen for an event, do a bunch of computations with that event, then discard the result) and ends up in brain dead land again (have 75 methods that each contain essentially the same series of boiler plate code)

How can we turn crap into fois gras?Aside from the obvious "rm -rf", the most useful guide I have seen is the superb Working Effectively with Legacy Code by Michael Feathers. It helped me see how i could take what was essentially a flaming turd of a code base and begin making classes testable. This is the most useful coding book I have by far.

One basic problem is that reading other people's code requires a bit of empathy -- the ability to understand the thoughts of the programmer who wrote it. Writing good code also requires emphathy -- the ability to imagine what someone else reading your code will be thinking as he reads it. (For example, when the reader sees the new identifier you just defined, will its name trigger the appropriate concept in his mind?)

I suspect that programmers on the average unfortunately have less empathy for other people than average. That's why so many of us have sub-par social skills (for example, in competition with salesmen in a singles bar) and therefore feel more confident working with machines than with people.

Thus, the crappy code problem is inevitable. The best we can hope for is to create a culture of refactoring. Then, as each successive reader figures out the solution to a puzzle the programmer unintentionally created, she can improve the code to make its intentions more explicit.

> Thus, the crappy code problem is inevitable. The best we> can hope for is to create a culture of refactoring. Then,> as each successive reader figures out the solution to a> puzzle the programmer unintentionally created, she can> improve the code to make its intentions more explicit.

I do this as much as possible even though it incurs a bit of personal risk but I've run into a number of cases where the code was so bad that no one knows what to do with it.

I recently reviewed some old code that was written in COBOL. It took me a couple days to untangle the mess of GOTO to figure out what exactly the code was doing. Once I was done, it seemed to have a number of bugs in it. It's so convoluted and overcomplicated it should really just be rewritten from scratch. The problem is that there is no documentation of what the requirements were or what changes to those requirements were. Even if we went through and documented exactly what the code does, no one is really sure if that's what it should be doing. We suspect that it is not doing what it should.

On a different note, I agree that good code requires empathy on the part of the reader and the writer. But the bulk of the bad code I see is not a result of a lack of empathy but a lack of ability. That ability is to take a problem and distill it down into a simple form that can then be translated into code. There are a number of ways to approach this so I don't want to imply there is one right way but I see a lot of code that just meanders around in a semi-random manner until it gets to the point.

I mentioned above that someone who writes good code will take care to choose identifiers which trigger the correct concept in the mind of the reader. Such identifiers will tend to be lengthy. This approach will most appeal to people who read fast and type quickly, but who perhaps have weak short-term memories.

A programmer with a bit of dyslexia who find reading laborious and who, of necessity, has developed a powerful memory -- will in contrast likely prefer very short variable names to reduce his reading (and typing) burden. He will be happy to figure out what a variable is used for by seeing how it is used, and will be able to remember it long enough to get his job done.

Code written by either of the above types will be viewed as CRAP code when read by a programmer of the other type.

Since I, personally, fall into the category of "fast reader / poor memory" -- I can state with authority that the first approach is good, whereas the approach of the "poor reader / good memory" programmer is bad.

Of course, anything is bad when taken to extremes. For example, even I will admit that it is much better to name a loop's index variable "i" rather than, say, "theIndexVariableOfMyLoop". That's because the identifier "i" is _always_ used as a loop index variable; keeping this in mind does not tax my short-term memory. But, usually, the longer and more descriptive identifiers work better for me, and therefore in most situations they are objectively superior.

> ...I agree that good code requires empathy on the part of the reader and the writer.> But the bulk of the bad code I see is not a result of a lack of empathy but a lack of ability.> That ability is to take a problem and distill it down into a simple form that can then be translated into code.>> I recently reviewed some old code that was written in COBOL. It took me a couple days> to untangle the mess of GOTOs to figure out what exactly the code was doing. Once I was done,> it seemed to have a number of bugs in it. It's so convoluted and overcomplicated> it should really just be rewritten from scratch. The problem is that there is no documentation> of what the requirements were or what changes to those requirements were.> Even if we went through and documented exactly what the code does, no one is really sure> if that's what it should be doing. We suspect that it is not doing what it should.

That is a much deeper level of badness. What I usually try to do in such a situation is to rewrite the code to continue doing exactly what it currently does, but to make this much more explicit and modifiable. One can then think about creating a test case that exposes the supposed bug -- and then ask someone whether it should be changed. It is ideal if one can execute this testcase on the old codebase -- then one gets credit for discovering and fixing a bug!

Often times it turns out that, given the way business is done, this case just doesn't happen to occur (or the bug would have been discovered long ago). Typically, the re-written code will be more robust should business rules change in such a way so as later to permit that combination.

Code is a design artifact - a blueprint for software production. As such, it will have to elevate itself to the level it has achieved in other engineering disciplines. Once we reach that point, we'll be able to compare bridges with software products.As for the exact measure of the code goodness, I doubt it is needed. An experienced eye spots bad code really quickly. And, yes, unfortunately, there's too much of it around. Once we have been building software for several thousand years (as we do bridges now) and society starts requiring software engineers to get certified prior to delivering production code (much like engineers projecting bridges are) things will hopefully get better.

> That is a much deeper level of badness. What I usually> try to do in such a situation is to rewrite the code to> continue doing exactly what it currently does, but to make> this much more explicit and modifiable. One can then> think about creating a test case that exposes the supposed> bug -- and then ask someone whether it should be changed.> It is ideal if one can execute this testcase on the old> d codebase -- then one gets credit for discovering and> fixing a bug!

This is basically the only way out that we could think of, other than starting with fresh requirements.

> Often times it turns out that, given the way business is> done, this case just doesn't happen to occur (or the bug> would have been discovered long ago). Typically, the> re-written code will be more robust should business rules> change in such a way so as later to permit that> combination.

This code is basically reporting on data (it's more than that but it's plenty of explanation for this discussion) and the users are going to have a hard time knowing that they weren't informed of what they weren't informed of. At some point something that fell through the cracks may have gone 'critical' but no one made the connection.

> Code is a design artifact - a blueprint for software> production. As such, it will have to elevate itself to the> level it has achieved in other engineering disciplines.> Once we reach that point, we'll be able to compare bridges> with software products.> As for the exact measure of the code goodness, I doubt it> is needed. An experienced eye spots bad code really> quickly. And, yes, unfortunately, there's too much of it> around. > Once we have been building software for several thousand> years (as we do bridges now) and society starts requiring> software engineers to get certified prior to delivering> production code (much like engineers projecting bridges> are) things will hopefully get better.

I think we are still a couple of steps away from that. There isn't even a clear distinction between a computer scientist and a software engineer and a programmer similar to the difference between a physicist and say a mechanical engineer and a mechanic/machinist. Until it's clear what skills and knowledge are required to be a software engineer, it's not going to be feasible to certify people as such.

Writing non-crappy code is an art. That's why there are many great programmers who disagree among themselves as to what constitutes good code.

There are some general principles that everyone agrees on, like the DRY principle. A lot is left to interpretation.

Authoritarian tactics and one size fits all approaches are not going to produce happy results, because they seek to achieve those results through brute force, rather than through positively inspiring people to write better code.

I like to insist on a reasonable level of code quality, however, I think having a tool block "improper" check ins is a good way to alienate your team. One possible exception could be when you do it at the very start of the project and all of your team agrees. Then if/when the new guy join the project, the line is, "It's always been like that."

> Code is a design artifact - a blueprint for software> production. As such, it will have to elevate itself to the> level it has achieved in other engineering disciplines.> Once we reach that point, we'll be able to compare bridges> with software products.> As for the exact measure of the code goodness, I doubt it> is needed. An experienced eye spots bad code really> quickly. And, yes, unfortunately, there's too much of it> around. > Once we have been building software for several thousand> years (as we do bridges now) and society starts requiring> software engineers to get certified prior to delivering> production code (much like engineers projecting bridges> are) things will hopefully get better.

Certification does neither guarantee nor enforce well-written code. For civil engineering things are different because, for example, a badly built bridge can cause death, but a badly written spreadsheet application cannot (if not used where it's not supposed to be used). People who want to write good code will do so, regardless of whether they are certified or not. If someone is a certified Java developer does that mean he/she will write good Java code? I don't think so. There are just too many factors that determine the quality of code and I think a lot of it is in the development process where certain things just have to be enforced. Having something like C.R.A.P. might be a good idea.

I notice that my estimation of code's "goodness" goes up as I become more familiar with the context of what it does. In other words, with few exceptions, I dislike any code when I first look at it. As I become more familiar with the problem it is meant to solve, I appreciate more the approach the author took.

Code tends to become uglier over time as multiple authors, each with his/her own approach, modify it. Sometimes it can get ugly as unusual use cases come up that break the design of the program. In which case, the developer has to balance design, time line, and cost in developing a solution.

Just to be clear, I think there are different kinds of ugliness. There's ugly coming from not using a proper level of abstraction when it would help to make code more concise and then there's ugly that is more superficial, e.g. static C functions that are never called or variables that aren't used.

If I remember correctly, IEEE is working on a standard for Software Engineering Certification. Periodically I consider trying to pursue it, but there's not a lot of incentive - other than personal satisfaction - to do it.

Certification may be helpful in specialized areas of development, such as medical devices, but I am leery of its benefit for business applications.

An interesting point about engineering certification is that civil engineers tend to have the highest certification level, I believe somewhere above 80%. Whereas, electrical engineers tend to be significantly lower - below 40%. (These figures are coming from a hazy recollection of an article in either IEEE's Computer magazine or ACM's Communications over the last year.) What this tells me is that within established engineering disciplines, there is a large difference in the perceived value of certification.