Sunday, February 10, 2008

Portrait of a N00b

The older I grow, the less important the comma becomes. Let the reader catch his own breath. — Elizabeth Clarkson Zwart

This is how I used to comment my code, twenty years ago (Note: dramatization):

/** * By the time we get to this point in the function, * our structure is set up properly and we've created * a buffer large enough to handle the input plus some * overflow space. I'm not sure if the overflow space * is strictly necessary, but it can't hurt. Next we * have to update the counter to account for the fact * that the caller has read a value without consuming * it. I considered putting the counter-increment on * the shoulders of the caller, but since it meant every * caller had to do it, I figured it made more sense to * just move it here. We can revisit the decision down * the road if we find some callers that need the option * of incrementing it themselves. */ counter++; // increment the consumed-value counter

/** * Now we've got to start traversing the buffer, but we * need an extra index to do it; otherwise we'll wind up * at the end of the function without having any idea * what the initial value was. I considered calling this * variable 'ref', since in some sense we're going to be * treating it as a reference, but eventually I decided * it makes more sense to use 'pos'; I'm definitely open * to discussion on it, though. */ char* pos = buffer; // start our traversal

/** * NEXT, we... */

Does this style look at all familiar? It should! This is, to put it as impolitely as possible, n00b-style. (Incidentally, if u dont no wat a n00b iz, u r 1.)

This is how junior programmers write code. If you've read Malcom Gladwell's remarkable and eye-opening book The Tipping Point, you'll notice a striking similarity to the real-life 2-year-old Emily he describes in Chapter Three, who tells herself stories after her parents leave her room. Here's a short excerpt from one of her stories:

Tomorrow when we wake up from bed, first me and Daddy and Mommy, you, eat breakfast eat breakfast like we usually do, and then we're going to play and then soon as Daddy comes, Carl's going to come over, and then we're going to play a little while. And then Carl and Emily are both going down to the car with somebody, and we're going to ride to nursery school [whispered], and then when we get there, we're all going to get out of the car...

Gladwell's account of Emily is fascinating, as she's allegedly a completely normal 2-year-old; they all do this when Mommy and Daddy aren't around.

Gladwell explains:

Sometimes these stories were what linguists call temporal narratives. She would create a story to try to integrate events, actions, and feelings into one structure — a process that is a critical part of a child's mental development.

If you look back at the comments in my hypothetical code from 20 years ago, you'll see that I was doing exactly what Emily does: making up a temporal narrative in an attempt to carve out a mental picture of the computation for myself. These stories I told myself were a critical part of my mental development as a programmer. I was a child trying to make sense of a big, scary new world.

Most programmers go through this phase. It's perfectly normal.

In contrast, here's what my code tends to look like today:

Update, Nov 14 2011: I did a terrible job of making my point with this code. I deliberately chose some of the most freakish code I've ever written, because I wanted it to look ugly and scary. I'm trying to show here what "typical" veteran code looks like to a junior programmer. This code serves as a *caricature* for illustration purposes. You're supposed to be put off by it. If I had been trying to show you what modern art looks like to the uninitiated, I would have showed you a graffitied subway station wall that someone had just vomited on. This is the coding equivalent.

If you *insist* on missing my point entirely and arguing about whether this function is "good code" or not, then I assure you: this code is horrific. It's a Lisp port of a Java port of some old C code. *Both* ports intentionally stay as faithful to the original as possible, line-by-line in the most un-idiomatic code imaginable. Why? To make it easy to propagate bug fixes in the original to both ports. So it's ugly for a legitimate reason. But it's still frigging ugly.

If I'd seen this code 20 years ago I'd have been appalled. The lines of code are all crammed together! Some of them aren't even commented! If I'd been given the task of maintaining this code, I'd have been screaming "rewrite!"

I probably write more Java and JavaScript these days, but I picked an Emacs-Lisp function I wrote recently to highlight how alien my code today would have looked to me twenty years ago.

To be fair, this function is actually a port of some Java code from Mozilla Rhino's JavaScript parser, which in turn is a port of some C code from SpiderMonkey's parser, which in turn was probably borrowed and modified from some other compiler. Compiler code tends to have some of the purest lineage around, tracing back to the assembly-language code they wrote for the first compilers 40 or 50 years ago. Which means it's going to be a bit on the ugly side compared to "ordinary" code.

But when I write code in other languages these days, even in Java, it looks a lot more like this Emacs Lisp fragment than like the n00b code I was writing 20 years ago. It's denser: there's less whitespace and far less commenting. Most of the commenting is in the form of doc-comments for automated API-doc extraction. On the whole, my code today is much more compressed.

In the old days, seeing too much code at once quite frankly exceeded my complexity threshold, and when I had to work with it I'd typically try to rewrite it or at least comment it heavily. Today, however, I just slog through it without complaining (much). When I have a specific goal in mind and a complicated piece of code to write, I spend my time making it happen rather than telling myself stories about it.

A decade of experience makes you a teenager

After going through their 2-year-old phase, programmers eventually have to go through a stupid-teenager phase. All this month I've been hearing sad but unsurprising news stories about teenagers getting stuck on big rocks, being killed falling off cliffs, or dying of exposure. I'm actually lucky the same didn't happen to me when I was a teenager. It's just a bad time for us. Even though teenagers are old enough to understand the warnings, they have this feeling of invincibility that gets them into trouble and often mortal peril.

The programming equivalent happens around us all the time too. Junior programmers with five to ten years of experience under their belts (still n00bs in their own way) attempt to build giant systems and eventually find themselves stuck on the cliff waiting for a helicopter bailout, telling themselves "my next system rewrite will be better!" Or they fall off the cliff – i.e., the project gets canceled, people get laid off, maybe the company goes under.

Yes, I've gone through that phase too. And let's face it: even seasoned programmers need a little optimism and a little bravery in order tackle real challenges. Even as an experienced programmer, you should expect to fail at projects occasionally or you're probably not trying hard enough. Once again, this is all perfectly normal.

That being said, as a hiring manager or company owner you should keep in mind that "5 to 10 years of experience" on a resume does not translate to "experienced"; it means "crazy invincible-feeling teenager with a 50/50 shot at writing a pile of crap that he or she and his or her team can't handle, and they'll eventually, possibly repeatedly, try to rewrite it all." It's just how things are: programmers can't escape being teenagers at some point.

Building compression tolerance

Hopefully the scene I've painted so far helps you understand why sometimes you look at code and you just hate it immediately. If you're a n00b, you'll look at experienced code and say it's impenetrable, undisciplined crap written by someone who never learned the essentials of modern software engineering. If you're a veteran, you'll look at n00b code and say it's over-commented, ornamental fluff that an intern could have written in a single night of heavy drinking.

The sticking point is compression-tolerance. As you write code through your career, especially if it's code spanning very different languages and problem domains, your tolerance for code compression increases. It's no different from the progression from reading children's books with giant text to increasingly complex novels with smaller text and bigger words. (This progression eventually leads to Finnegan's Wake, if you're curious.)

The question is, what do you do when the two groups (vets and n00bs) need to share code?

I've heard (and even made) the argument that you should write for the lowest common denominator of programmers. If you write code that newer programmers can't understand, then you're hurting everyone's productivity and chances for success, or so the argument goes.

However, I can now finally also see things from the veteran point of view. A programmer with a high tolerance for compression is actually hindered by a screenful of storytelling. Why? Because in order to understand a code base you need to be able to pack as much of it as possible into your head. If it's a complicated algorithm, a veteran programmer wants to see the whole thing on the screen, which means reducing the number of blank lines and inline comments – especially comments that simply reiterate what the code is doing. This is exactly the opposite of what a n00b programmer wants. n00bs want to focus on one statement or expression at a time, moving all the code around it out of view so they can concentrate, fer cryin' out loud.

So it's a problem.

Should a team write for the least common denominator? And if so, exactly how compressed should they make the code? I think the question may be unanswerable. It's like asking for a single format for all books, from children's books to epic novels. Each team is going to have its own average preference. I suspect it's a good idea to encourage people to move their stories into design documents and leave them out of the code, since a junior programmer forced to work in a compressed code base may well grow up faster.

As for me, at this point in my career I would rather puzzle through a small, dense, complex piece of code than a massive system with thousands of files containing mostly comments and whitespace. To some people this trait undoubtedly flags me as a cranky old dinosaur. Since this is likely the majority of programmers out there, maybe I am a cranky old dinosaur. Rawr.

Metadata Madness

Everyone knows that comments are metadata: information about the data (in this case, the data being your source code.) But people often forget that comments aren't just a kind of metadata. Comments and metadata are the same thing!

Metadata is any kind of description or model of something else. The comments in your code are just a a natural-language description of the computation. What makes metadata meta-data is that it's not strictly necessary. If I have a dog with some pedigree paperwork, and I lose the paperwork, I still have a perfectly valid dog.

You already know the comments you write have no bearing on the runtime operation of your code. The compiler just throws them away. And we've established that one hallmark of a n00b programmer is commenting to excess: in a sense, modeling every single step of the computation in painstaking detail, just like Emily modeled her ideal Friday by walking through every step and reassuring her 2-year-old self that she really did understand how it was going to work.

Well, we also know that static types are just metadata. They're a specialized kind of comment targeted at two kinds of readers: programmers and compilers. Static types tell a story about the computation, presumably to help both reader groups understand the intent of the program. But the static types can be thrown away at runtime, because in the end they're just stylized comments. They're like pedigree paperwork: it might make a certain insecure personality type happier about their dog, but the dog certainly doesn't care.

If static types are comments, then I think we can conclude that people who rely too much on static types, people who really love the static modeling process, are n00bs.

Hee hee.

Seriously, though: I'm not actually bashing on static-typing here; I'm bashing on the over-application of it. Junior programmers overuse static typing in the exact same way, and for the same reasons, as they overuse comments.

I'll elaborate by first drawing a parallel to data modeling, which is another kind of "static typing". If you've been working in a field that uses relational databases heavily, you'll probably have noticed that there's a certain personality type that's drawn to relational data modeling as a career unto itself. They're usually the logical modelers, not the physical modelers. They may have begun their careers as programmers, but they find they really love data modeling; it's like a calling for them.

If you know the kind of person I'm talking about, you'll doubtless also have noticed they're always getting in your way. They band together and form Database Cabals and Schema Councils and other obstructive bureacracies in the name of safety. And they spend a lot of time fighting with the engineers trying to get stuff done, especially at the fringes: teams that are not working directly with the schema associated with the main revenue stream for the company, but are out trying to solve tangential problems and just happen, by misfortune, to be homed in the same databases.

I've been in surprisingly many situations at different companies where I had a fringe team that was being held up by data modelers who were overly-concerned about data integrity when the real business need was flexibility, which is sort of the opposite of strong data modeling. When you need flexible storage, name/value pairs can get you a long, long, LONG way. (I have a whole blog planned on this topic, in fact. It's one of my favorite vapor-blogs at the moment.)

It's obviously important to do some amount of data modeling. What's not so obvious is when to stop. It's like commenting your code: newer programmers just don't know when to quit. When you're a little insecure, adding comments and metadata are a great security-blanket that make you feel busy when you've in fact stopped making forward progress and are just reiterating (or perhaps teaching yourself) what's already been accomplished.

Hardcore logical data modelers often suffer from an affliction called metadata addiction. Metadata modeling is seductive. It lets you take things at a leisurely pace. You don't have to be faced with too much complexity at once, because everything has to go in a new box before you'll look at it. To be sure, having some metadata (be it a data model, or static types, or comments) is important for human communication and to some extent for performance tuning. But a surprising percentage of people in our industry take it too far, and make describing an activity more important than the activity itself.

The metadata-addiction phenomenon applies equally to coders. Code is data, and data is code. The two are inextricably linked. The data in your genes is code. The floor plans for your house are code. The two concepts are actually indistinguishable, linked at a fundamental level by the idea of an Interpreter, which sits at the very heart of Computer Science. Metadata, on the other hand, is more like the kidney of Computer Science. In practice you can lose half of it and hardly notice.

Creeping bureacracy

I think that by far the biggest reason that C++ and Java are the predominant industry languages today, as opposed to dynamic languages like Perl/Python/Ruby or academic languages like Modula-3/SML/Haskell, is that C++ and Java cater to both secure and insecure programmers.

You can write C++ like straight C code if you like, using buffers and pointers and nary a user-defined type to be found. Or you can spend weeks agonizing over template metaprogramming with your peers, trying to force the type system to do something it's just not powerful enough to express. Guess which group gets more actual work done? My bet would be the C coders. C++ helps them iron things out in sticky situations (e.g. data structures) where you need a little more structure around the public API, but for the most part they're just moving data around and running algorithms, rather than trying to coerce their error-handling system to catch programmatic errors. It's fun to try to make a bulletproof model, but their peers are making them look bad by actually deploying systems. In practice, trying to make an error-proof system is way more work than it's worth.

Similarly, you can write Java code more or less like straight C, and a lot of seasoned programmers do. It's a little nicer than C because it has object-orientation built in, but that's fairly orthogonal to the static type system. You don't need static types for OOP: in fact OOP was born and proven in dynamic languages like Smalltalk and Lisp long before it was picked up by the static-type camps. The important elements of OOP are syntax (and even that's optional) and an object model implemented in the runtime.

So you can write Java code that's object-oriented but C-like using arrays, vectors, linked lists, hashtables, and a minimal sprinkling of classes. Or you can spend years creating mountains of class hierarchies and volumes of UML in a heroic effort to tell people stories about all the great code you're going to write someday.

Perl, Python and Ruby fail to attract many Java and C++ programmers because, well, they force you to get stuff done. It's not very easy to drag your heels and dicker with class modeling in dynamic languages, although I suppose some people still manage. By and large these languages (like C) force you to face the computation head-on. That makes them really unpopular with metadata-addicted n00bs. It's funny, but I used to get really pissed off at Larry Wall for calling Java programmers "babies". It turns out the situation is a little more complicated than that... but only a little.

And Haskell, OCaml and their ilk are part of a 45-year-old static-typing movement within academia to try to force people to model everything. Programmers hate that. These languages will never, ever enjoy any substantial commercial success, for the exact same reason the Semantic Web is a failure. You can't force people to provide metadata for everything they do. They'll hate you.

One very real technical problem with the forced-modeling approaches that static type systems are often "wrong". It may be hard to imagine, because by a certain definition they can't be "wrong": the code (or data) is programmatically checked to conform to whatever constraints are imposed by the type system. So the code or data always matches the type model. But the type system is "wrong" whenever it cannot match the intended computational model. Every time want to use multiple inheritance or mixins in Java's type system, Java is "wrong", because it can't do what you want. You have to take the most natural design and corrupt it to fit Java's view of the world.

An important theoretical idea behind type systems is "soundness". Researchers love to go on about whether a type system is "sound" or not, and "unsound" type systems are considered bad. C++ and Java have "unsound" type systems. What researchers fail to realize is that until they can come up with a type system that is never "wrong" in the sense I described earlier, they will continue to frustrate their users, and their languages will be abandoned for more flexible ones. (And, Scala folks, it can't just be possible to express things like property lists – it has to be trivial.)

To date, the more "sound" a type system is, the more often it's wrong when you try to use it. This is half the reason that C++ and Java are so successful: they let you stop using the type system whenever it gets in your way.

The other half of their success stems from the ability to create user-defined static types. Not, mind you, because they're helpful in creating solidly-engineered systems. They are, sure. But the reason C++ and Java (particularly Java) have been so successful is that their type systems form a "let's not get any work done" playground for n00bs to spend time modeling things and telling themselves stories.

Java has been overrun by metadata-addicted n00bs. You can't go to a bookstore or visit a forum or (at some companies) even go to the bathroom without hearing from them. You can't actually model everything; it's formally impossible and pragmatically a dead-end. But they try. And they tell their peers (just like our metadata-addicted logical data modelers) that you have to model everything or you're a Bad Citizen.

This gets them stuck on cliffs again and again, and because they're teenagers they don't understand what they did wrong. Static type models have weight and inertia. They take time to create, time to maintain, time to change, and time to work around when they're wrong. They're just comments, nothing more. All metadata is equivalent in the sense of being tangential documentation. And static type models get directly in the way of flexibility, rapid development, and system-extensibility.

I've deleted several thousand words about the evolution of Apache Struts and WebWork, an example framework I chose to illustrate my point. Rather than waste a bunch of time with it, I'll just give you a quote from one of the Struts developers in "The Evolution of Struts 2":

...the Struts 1 code base didn’t lend itself to drastic improvements, and its feature set was rather limited, particularly lacking in features such as Ajax, rapid development, and extensibility."

Struts 2 was thrown away for WebWork, which was in the process of throwing away version 1 (for similar reasons) in favor of version 2 (which has all the same problems).

Some of those several thousand words were devoted to JUnit 4, which has comically (almost tragically) locked on, n00b-style, to the idea that Java 5 annotations, being another form of metadata, are the answer to mankind's centuries of struggle. They've moved all their code out of the method bodies and into the annotations sections. It's truly the most absurd overuse of metadata I've ever seen. But there isn't space to cover it here; I encourage you to go goggle at it.

There are die-hard Java folks out there who are practically gasping to inject the opinion, right here, that "rapid development" is a byproduct of static typing, via IDEs that can traverse the model.

Why, then, was Struts considered by its own developers to be a failure of rapid development? The answer, my dear die-hard Java fans, is that a sufficiently large model can outweigh its own benefits. Even an IDE can't make things go faster when you have ten thousand classes in your system. Development slows because you're being buried in metadata! Sure, the IDE can help you navigate around it, but once you've created an ocean, even the best boats in the world take a long time to move around it.

There are hundreds of open-source and proprietary Java frameworks out there that were designed by code-teenagers and are in perpetual trouble. I've often complained that the problem is Java, and while I think the Java language (which I've come to realize is disturbingly Pascal-like) is partly to blame, I think the bigger problem is cultural: it's hard to restrain metadata addiction once it begins creeping into a project, a team, or an organization.

Java programmers, and logical data modelers, and other metadata-addicted developers, are burying us with their "comments" in the form of models within their static type system. Just like I did when I was a n00b. But they're doing it with the best of intentions, and they're young and eager and energetic, and they stand on street corners and hand you leaflets about how great it is to model everything.

Seasoned programmers ignore them and just get it done.

Solutions and takeaways

Software engineering is hard to get right. One person's pretty data model looks like metadata-addiction to another person.

I think we can learn some lessons from code-commenting: don't try to model everything! You need to step back and let the code speak for itself.

For instance, as just one random illustrative example, you might need to return 2 values from a function in Java (a language with no direct support for multiple return values). Should you model it as a MyFunctionCallResult class with named ValueOne and ValueTwo fields (presumably with actual names appropriate to the problem at hand)? Or should you just return a 2-element array (possibly of mixed types) and have the caller unpack it?

I think the general answer to this is: when in doubt, don't model it. Just get the code written, make forward progress. Don't let yourself get bogged down with the details of modeling a helper class that you're creating for documentation purposes.

If it's a public-facing API, take a lesson from doc-comments (which should be present even in seasoned code), and do model it. Just don't go overboard with it. Your users don't want to see page after page of diagrams just to make a call to your service.

Lastly, if you're revisiting your code down the road and you find a spot that's always confusing you, or isn't performing well, consider adding some extra static types to clarify it (for you and for your compiler). Just keep in mind that it's a trade-off: you're introducing clarifying metadata at the cost of maintenance, upkeep, flexibility, testability and extensibility. Don't go too wild with it.

That way the cliff you build will stay small enough for you to climb down without a helicopter rescue.

Postscript

I'm leaving comments on, at least until "click-my-link" spam starts to surface. I'm curious to know how this entry goes over. This was an especially difficult entry to write. I did a lot of editing on it, and left out a lot as a result. I feel like I may not have made my points as clearly as I'd like. And I'm sure I haven't convinced the metadata-addicted that they have a problem, although at least now they know someone out there thinks they have a problem, which is a start.

116 Comments:

Well, the typing issue is a thorny one. I think as a general rule, the reddit types are too eager to denounce it immediately and tell us all that Ruby is the answer, or whatever it is this week, and the Java purists and academics make the opposite argument.

Coming from someone from a dynamic background, I didn't use a language with any strong typing until I had been programming for 2 years. I'm not a heavy commenter of code (I'm not into redundancy), but I love static typing....until I don't.

A few years of not catching an error until a specific code-path is executed (and let's be honest, I don't test as much as I should) gets really old. I love it when the compiler says "you're a moron" before I even have to think about writing a test.

I think expecting tests for every situation just isn't realistic. I enjoy this about C as well, in as much as the compiler can do this, so maybe it's a compiler thing more than a static thing.

I'm pulling a Yegge. Point is, use both. Use the dynamic side enough to find the static side ridiculous, and use the static side enough to find the dynamic side occasionally too eager to not be helpful.

I think this is the point where you'll program with an appropriate level of safety. There are things you can do in a dynamic language to give you that safety, and there are things you can do in a static language to not be stupid about. Keeping perspective on these moments is what professional, accumulated knowledge gives you.

Good post. I've had some doozy arguments about excessive commenting and class/interface defining myself. You covered some aspects of the issue I hadn't thought of.

There's a parallel with processes: managers develop corporate processes to try to allow not-so-smart/experienced people to accomplish the same tasks that the talented people can do. It rarely if ever works because doing so takes away the understanding and flexibility that made the initial accomplishment so worthwhile.

In both cases there's the mistaken belief that the important thing is to allow repetition, reuse, as if we were still on a Henry Ford assembly line instead of the modern world where doing things well, and quickly, is usually 10x as valuable as doing them a 2nd time.

I'm not really smart enough to debate with the bigwigs, but I can categorically agree with the idea that over modelling costs a lot for sometimes negligible gain.

Since shifting focus from enterprise systems in Java to much more rapid projects in Ruby over the last 18 months I'm noticing daily just how much more we get done, even though our team sizes are way smaller.

And also, lets not forget that it's very possible to over model in a dynamic language as well.

I do worry though that the tradeoff is yet to be felt - there are a lot of people "just doing it", but is maintenance of their rapidly developed Ruby/Python program going to come back and bite them some time down the line?

Ugh. There are some good points in there, but that code is not nearly as good as you think it is. Of course you don't have to write paragraph-long comments, but have you ever heard of "extract method"?

Since it seems that forced code review hasn't cured you of writing only for yourself, I think the only hope of getting beyond the adolescent phase is either pair-programming or teaching.

I've thought about some of this a lot, particularly the experienced programmer vs. the novice programmer bit. I work on an open-source project, where I'm an experienced and trained programmer but the majority of contributions come from people who are not.

My experience is that it's up to trained programmers to devise the "way it's going to be" and review and correct the novice programmers on that way. The novice programmers eventually pick it up and understand it, even if in a limited sense, and start to write their code that way.

My experience says that it's entirely possible to train people to be better programmers without years and years of experience, and so it's never necessary to reduce yourself to writing for the lowest common denominator. Of course, if you're approaching the bound of complexity where even an experienced programmer would have difficulty reading the code, then that's a completely different issue to consider.

It all started at my first job supporting a database driven app written by a cobol programmer.

I spent more time putting out fires then I did adding new features. The database model wasn't locked down. Once I changed the database so that the data model was more explicitly defined through foreign keys and the like the bugs in the application code that caused inconsistancies were easier to find. I was able to work on features on a regular basis and deal with bugs as the occurred rather then when they were discovered.

Fast forward 3 years and I'm working in a Python shop. All my functions start with:assert isinstance(x, y)Some of my co-workers complain that things fail when they pass an int instead of a float. It's easy to ignore their complaints as they also state that unit testing isn't important.

i recognize myself in your description of the software teenager, i definitely went through that phase.

i've grown since then and the biggest change i can point to is my move away from statically-typed programming languages. it feels like my arteries have been unclogged.

your statement about tackling technical challenges head-on is exactly the feeling i get when i code these days. i spend most my time solving the problem i need to solve instead of building up scaffolding to solve it.

if i read your post a few years ago i would have left an emotional comment listing reasons i thought you were so wrong. being a teenager is tough. i wonder what i'll think of my current self ten years from now.

"To date, the more "sound" a type system is, the more often it's wrong when you try to use it."

Boy are you ever wrong on this one. Both Oberon and Haskell have type systems which are very sound, and in my experience, I can count the number of times it's been "wrong" on one hand. ONE hand. Out of all the programming projects I've done with them.

In point of fact, Oberon's type system nearly is identical to C's in terms of expressivity, but is much stricter than C's to ensure proper coupling of mutually untrusted modules. Having many years experience with both C and Oberon, I find that C offers *ZERO* productivity benefit over Oberon, but instead a 100% more error-prone environment.

This is why every solid C coder will tell you, "Use -Wall." That forces the compiler (GCC in this case; Visual C/C++ has similar features) to treat all warnings as errors. They'll also tell you to minimize the use of type-casting. These suggestions come from C coders with >20 years experience. These rules of thumb in C are mandated in Oberon.

No, what makes C/C++ more popular than other languages is their relative brevity -- as Paul Graham points out, brevity is what makes a language "popular." This is why Oberon, for all its bad-a$$ness, failed to capture the market. Being a Modula-2-derived language, it was "too wordy."

Isn't it true in general that the average programmer doesn't comment enough? It seems rare to me that I see such a ridiculous narrative in the comments and more often see just pages and pages of code with no comments, no factoring, etc. It seems that is the more common n00bproblem than over-commenting once they are out in industry and not turning in cs101 assignments anymore where they know the professor might spank them for not commenting.

Funny you should pick on database data modeling. The idea behind it is to eliminate redundancy in data and to represent it in a form usable by multiple applications for a wide assortment of purposes. This sort of modeling is supposed to improve flexibility in how data may be used. Compared to overcommenting or static typing, database metadata seems a very different animal.

Funny. I was just working on some code today and started pulling out pieces and modeled it as an interface so that I could mock it :) And yes the double meaning is intentional.

I think I've finally hit my rebellious teenage years. I've stopped listening to the man (aka authority figures aka "a"-list bloggers) about TDD, TFD, BDD, ADD :), etc. And also mock vs. stub and state vs. behavior testing, etc. Who friggin' cares? I just want to solve the problem at hand and write some tests that turn the bar green and give me a "good enough" feeling that the code is fine.

It is interesting that you praise code compression and then mock Haskell and its type system. It really looks like you're contradicting yourself. Besides being one of the tersest languages known to man (consider point-free code and the amount of plumbing that can be buried in a stack of monads), you could argue that Haskell types are the ultimate in code compression.

A function's type is typically a one-line compression of all of its code - enough detail to give you a mental model of what the function does, but abstracting away all of the details of how a function does its job. The correspondence between functions and types can be very tight as we see from tools like hoogle and djinn, where, in a magical reversal of type inference, functions can often be inferred (or found) based on their types.

And Haskell has an inference based system (except for some of the more experimental corners), so you can get that compressed mental model for free - the compiler will compute it for you! Or if you want to check that your mental model corresponds to the code that you writing as you go, you write down a type signature and have the compiler check it. This is a great way to create useful documentation and get feedback while you're working - without taking you out of coding space and into testing space.

I'll agree that learning to use an advanced static type system is a difficult skill and that it can take even good programmers years to understand how a type system can be a sword that helps you destroy complexity rather than create it. But just because a skill is hard to master doesn't mean it isn't worth mastering. I'm more productive in Haskell than I am in any other language not just because I've been a full-time Haskell programmer (at least as much as I can be a full-time anything at a startup) for over 4 years, but because I've learned how to turn its type system into one of my most powerful development allies - helping me check my mental models, make sure I don't make silly mistakes when refactoring or extending code and, most importantly, giving me a shorthand language that lets me take the vague intuitions I want to implement and start giving them (minimal) concrete form. I can use this concrete form as a starting point for code, as an efficient way to communicate with colleagues and as a way to keep more parts of a fantastically complex system in my head at the same time.

It really sounds like you've just missed the point of modern type-inference systems. I'll agree that there are plenty of mediocre researchers trying to fill in ugly corners with arcane theories, but the core of a Hindley-Milner based type system is a beautiful thing - and the battle-tested set of extensions you can see in languages like Haskell and OCaml are worth allies for any programmer.

"To date, the more "sound" a type system is, the more often it's wrong when you try to use it. This is half the reason that C++ and Java are so successful: they let you stop using the type system whenever it gets in your way."

Isn't there truth to the other side then too? The more dynamic a language's type system is the more comments you need to explain what function arguments are, and the more unit tests you need? So maybe Java's success is in striking a balance between the two.

I get the point about static types being just metadata, but I really feel like you're overstating your case. Types are much more functional as metadata than code comments are; they help ensure the correctness of your program. That doesn't mean they're always the right thing, just that the comparison is a bit stretched. They're closer to unit tests than to comments. People can get lost in writing the perfect set of tests, too, and unit tests also exert a serious maintenance drag by making changes to the system difficult, but that doesn't mean they're not useful. The larger the system is and the more people that work on it, the more useful the type information is. And yeah, I know ruby/python/lisp/etc. help keep projects smaller so you don't get in those situations as easily, but some systems just end up having a large surface area no matter what language they're in.

As to datamodelling, I think you're completely off. Name value tables? Really? Maybe for a prototype or your personal website. But for anything that needs to perform or where you actually care whether or not your data is trashed, you kind of need some kind of schema with (gasp!) typed, named columns and (double gasp!) maybe even some foreign key or nullability constraints. Code is much easier to change release to release than the database schema is, which means you do need to spend more time making sure you can live with whatever you ship/put into production. A bunch of name-value tables with self-pointers might seem more "flexible" at development time, but it makes interpreting the data basically impossible, and god help you if your application changes in such a way that it starts misinterpreting things. If your code is buggy, that's one thing. If your code loses or corrupts data, that's generally game over.

And maybe the majority of database applications out there never require any performance tuning, but as I'm sure you know if your application is going to expect any significant database load the name/value architecture probably isn't going to fly, and given how hard schemas are to change it's not something you want to find out after your server's fallen over under load.

In general you seem to be able to make a point properly, but not in this post. It sweeps from semi-point to semi-point without getting really TO the point, IMHO.

The main gripe I have with your post is that it tries to sell the idea that over-using comments and static types is bad, however it fails to illustrate what is NOT over-using and what is. This surprises me because it's so simple to describe:

- with static types: if introducing a static type DECREASES complexity: introduce it, otherwise DONT- with comments: if adding the comment makes it for a mortal easier to understand wtf is going on, add the comment, otherwise, don't.

Humans suck really bad in interpreting code. We need every help we can get and even then we suck in it. This implies two things:1) a programmer who inherits a project has in general a hard time2) a programmer who just wrote a piece of code can't in ALL cases find the errors s/he made in that code right away be re-reading it.

I.o.w.: if you as a programmer rely on 'the code speaks for itself', you're mistaken: a human has to PARSE and INTERPRET the code to understand what it does and to know what the value of _foo is at line 243 after that wacky loop has completed. Comments can help in that area, and should be added to AID to understand the code when a human has to read it.

And we're not all equal. A programmer who thinks s/he's very very smart can perhaps decide not to add any comments because it's so straight forward, however a person who takes over the project might just because of that have a hard time understanding it, and might misinterpret what the code does, introduce a mistake because of that etc.

Is that progress? Did the team as a whole become better because of the lack of comments because the veteran was too snobby to realize that not everyone can program a C++ compiler in assembler?

I surely think not. That doesn't mean we all should write books inside comments. As I said: comments should describe what the code does for the human reader. Not as in: // increases ibut as in: /* we have to check ... here because if we do it later we have a performance problem. */

Steve, pay attention to the comments section. In particular, Mark, Alan, Samuel, and dipplego. Your critics tend to be right.

It just seems like you see a problem but don't understand it. Imparting this vision on others is one thing. Imparting an incorrect understanding is another. The former is an experience report, while the latter infects reader's minds.

Probably the most balanced explanation of the differences between C and Lisp is Richard Gabriel's Counterpoint: Do programmers need seat belts?.

Please do not tell impressionable readers that Database Administrators know nothing about programming. I could've put up with most of your nonsense, but that comment is intolerable. Mark the Metadata Addict in the comments section explained why. Telling people to devalue the Schema is the biggest mistake you could possibly make.

Moreover, telling people models aren't important shows that you see a problem but don't understand it. Models are everywhere in languages like Lisp. Just pick up a book published by Springler-Verlag about AI and planning algorithms, and you'll see tons of code written in Lisp along with discussion about how the author's model stacks up against other models.

I read your blog posts mostly for the interesting metaphors, not for the technical advice.

I've got to disagree with your view that types are purely documentation in the same way that comments are.

Comments aren't understood by the compiler. Types are. There's a key difference between 'stuff that a compiler can do things with' and 'stuff that a compiler can't do things with'.

All programming is telling a computer what to do. Documentation is explaining to humans what it does. Some things, like well-chosen function names, function signatures (whether that's an informal "this takes two arguments" or a highly-specified combination of return types and typed arguments) are useful to both sides.

There's also a lot of misinformation. A well-typed program isn't any more correct than an untyped program; both can still have bugs. The former has a class of problems that a compiler (or IDE) can find out for you in advance, but it's not a guarantee of 'correctness' that some people seem to claim, which is an easy point to pick on.

As for the type information at runtime; there *is* information at runtime in some languages (Java's use of 'instanceof' or 'getClass', for example). In fact, some of this type information is available in dynamically typed languages as well; you can find a python's class at runtime and choose to do different things.

One advantage of Python/Ruby vs Java at the moment is that the former allows for functions to be passed around, which Java doesn't. However, that's not a failure with statically typed systems; Scala supports that, for example.

To conclude; there's merits in types, but the whole argument about types being 'good' or 'bad' is pretty polarising. I don't see why good systems can't take advantage of both in the right situations; and part of that is understanding what the limitations are in each place. I wouldn't use a J2EE EJB system for mailing me when someone's birthday is coming up; but then again, I wouldn't use Python to write a distributed transactional on-line banking system either. The big problem is people who are only exposed to one type of problem, then think that everything can be done with the same set of tools.

Beginners are much more comfortable with their native language, and have to code in both natural language and the target language; // Make the number one bigger; i++;

The type of compression I think you're talking about is mainly one of fluency; the ability to 'think in Russian'(http://www.imdb.com/title/tt0083943/) That once you've internalised 'i++', you don't need to explain it back to yourself.

On the other hand, Steve, you are overusing the metaphor that static types are metadata, as some before me have pointed out.

But, I would like to test a theory of mine. Me, myself and I, am solidly anchored in the safety of strongly typed languages. But I can sort of "feel" advantages of a loser typed system, but my experice of such languages is limited to a short session on javaScript, in project where we were writing in the old style anyway. So my question is this:

Could it be that the productivity advantage, sometimes seen in dynamic languages is greater for smaller projects. Could it be that in fact development [of a fully debugged system] becomes a lot harder the dynamic way for a project that is larger than say 10 man-years? Opinions ? Anyone?

This article seems to be an unhealthy mix of overgeneralization and childish name calling. (Do you really need to call people "metatdata addicts" to convey the notion that there's a point of diminishing returns for modeling activities?)

In my own experience, what you're going through seems to be common among "senior" programmers. They tend to confuse "writing code" with "getting stuff done." I can only speculate, but I think it may be a natural consequence of the narrow perspective afforded by writing code most of the time.

The comments on database schemas are particularly telling. You seem to be assuming that all the potential use cases for the database occur within the confines of your current code base. Might be true initially, but just plain silly as a long-term assumption.

Thanks for yet another thought-provoking post Steve. Just quickly I have to say please keep posting and keep the posts just as long as they are now (or longer)!

Almost every line of code I write for my day job is then available to any of the hundreds of thousands of network creators on Ning. Those who actually delve into it (you don't have to) vary from experienced programmers to people that just want to add a new page to their website and it is their first experience with programming.

I wonder what your advice would be on commenting (and static typing, and everything else) in that scenario?

We tend to go moderately big (javadoc-style) on the comments at the class and function level. And we even have a little static typing in there, even though this is PHP.

A different question from the one you are addressing, for sure, but one that is particularly interesting to me!

Comments? We don't need no stinkin' comments, especially those awful ones to demark the end of a block or class:

public class FooBar{} // end class FooBar

I will admit that I'm getting uncomfortable about my addition to curly brace languages after reading Steve for a while.

I'd love to see how long his rant about paper architects and UML would be. That's the ultimate meta-data, in my opinion.

I'll bet that Google would laugh at the idea of someone with the title "architect" not writing code, but that's exactly the direction that many companies are taking. It dovetails well with their mental model of "software development as manufacturing", where UML takes the place of engineering drawings and overseas outsourced coders are the assembly line workers stamping out the widgets according to plan.

teenager ... seems rougly analagous to effects seen in Brooks, such as "second system syndrome".

My commenting style tends more toward somewhat richer "header block" comments, about the purpose and general approach to the function and less inline comments unless there's some nasty gotcha in there. ... avoid the nasty gotchas.

I program in English. After reviewing the English, I comment in Java after each sentence to let the computer know how to do it. In essence, I program in dual languages. IMHO the approach is what's important.

"in fact OOP was born and proven in dynamic languages like Smalltalk and Lisp long before it was picked up by the static-type camps"

(nitpick alert)

Proven, yes, but not born - Simula-67 was statically typed, and inheritance is arguably as static a concept as data modelling.

"As for me, at this point in my career I would rather puzzle through a small, dense, complex piece of code than a massive system with thousands of files containing mostly comments and whitespace. To some people this trait undoubtedly flags me as a cranky old dinosaur"

Yeah, me too, except I've always felt that way. If I can see something all at once, there's a much higher chance I can work it out all at once. (But then I also have the screenfulosaurus bit set.)

It's always a trade off, but I think that the tenet of "self documenting code" is really important when you're working in a sizable team. In the older sections of our codebase it can take a long time to work out how some clever succinct piece of code works before it can be edited for some simple maintenance. You recently advocated reading Fowler's Refactoring which pushes this point throughout and yet this piece seems to me to say almost the complete opposite. I really like your essays and think that they lead to much health debate, but this latest one seems contradictory.

I'm not sure I should feel that enthousiatic about this article. Enthousiasm make you look like noob.I've been reflecting for months an article like that. Everything seem to fall into place after reading it. I'm really pleased you wrote it. I wouldn't have reach this kind of masterpiece, not having your style.This the very true story of how a programmer grow. This is my story, I've gone by every state you describe from comment, to metadata and bureacraty. I wouldn't have suspected the parallel with 2-year-old Emily temporal narrative.This will relax some tension, when i feel code is monstruouly verbose with for loop indexes beeing iProdigiouslyLongLoopCount, comments and so on. Thank you.

Well done! Just a small note about static typing, actually Haskell's type system is quite close to an ideal state, where you are not forced to explicitly state type in your code, but types are still there automagically.

They're like pedigree paperwork: it might make a certain insecure personality type happier about their dog, but the dog certainly doesn't care.==================True but the folks who invested the money in that dog probably care!

Maybe this is a n00b point of view, but bare code has no meaning. Why a piece of code exists or why it's written the way it is are left up to the maintenance programmer's imagination. In the words of Bruce Lee, "It is like a finger pointing away to the moon. Don't concentrate on the finger, or you will miss all the heavenly glory"; but the code is all finger and no moon.

Mandating comments will also save you from the worst excesses of geniuses, golfers, and optimizers.

Where I work, there is legacy code without legacy documentation. Now no one just knows how it works, and we're basically stuck at a French cafe, trying to induce the language that we are on contract to extend.

Very interesting. (The nitpicking comment that Simula was the first OO language and it was static has been already been made... darn!) I would say I'm the typical lazy teenager who wants to enjoy the free sets of tests that come with a nice compiler, instead of having to write them myself: i.e. I want somebody else to take care of checking in the collection of pedigree dogs I need to walk that nobody put a couple of ducks (and without going to each one and see if it walks like a duck and quacks like a duck...). Thats tends to be particularly nasty, specially in "production". I have some appreciation for the contract of types in a method declaration (I like to know if the method expects a duck or a pedigree dog... they tend to be different), specially if I'm not the only one writing the code.So far, (with lots of generalization), static type languages tend to be faster than dynamic ones and type inferencing removes much of the verbosity. On the other hand, the ability to modify allows lot of flexibility.

A hardcore experienced programmer does not cloud his mind with such dogma (static typing bad, dynamic typing better, etc). The greatest asset a senior developer has is his ability to recognize a problem, recognize a good solution, or if he has no experience in a good solution, think one out. Thinking is your best asset.

Just as there are times when commenting makes sense, there are times when static typing makes sense.

Static typing is like saying there is this box and you can only put this kind of thing in it. The box is labeled.

Dynamic typing is like saying we have these generic transparent boxes that you have to look into to find out what is in there. or just remember where you put everything.

Yeah labelling boxes sucks but sometimes its very useful. Haskell has an auto labeler which is even more useful.

Different problems need different solutions and it takes a professional to see when and where to use different tools.

The dogma you talk about is from someone who has only solved a specific set of problems, or from someone who was not smart enough to use the right tools.

But to know the right tools takes time. The best benefit to programmers is not ruby, or dynamic languages, but just writing lots of code. By good teaching as well (which we lack in USA). By writing code in static and dynamic languages.

Your apparent frustration is the result of you not understanding this basic premise. That going from noob to rock star takes time.

All anger and frustration come from ignorance. I just hope this post does not dilute people who write in dynamic languages to think they are rock stars if they are not.

HMTL and Javascript are some of the most permissive languages for writing code, but I would hardly argue someone who only writes in those is a rock star.

I'm kind of a n00b programmer, but it seems to me, when considering looking at other people's code, or when revisiting something I wrote long ago, that it's a lot easier to filter out unwanted comments than it is to cause them to appear. In fact, I suspect 90% of the readers here have written a quick script to do just that.

I grant a lot of comments are unhelpful, and some are even misleading. But there's always the chance that the comments can provide an additional insight.

It has not been my happiness to work with lots of code written by really good programmers (my own included), so I say, err on the side of excessive commenting, if such a choice actually has to be made.

1) If you don't start declaring complex types, the compiler will start inferring data types that you can't begin to understand because of the complexity of the type. The type of a relatively simple function could easily be longer than the code for that function. So you have to play the meta-data game if you're in it for the type system.

2) The type system still gets in your way. E.g. many Lisp functions accept some value or nil (aka null) and just return nil on nil input. This is really helpful when you want to propagate a non-error situation where some data should be ignored. But in Haskell/Ocaml you need to declare a Nullable type. Polymorphic data types helps here, but then you with so many "case" statements to propagate null value that you write a meta-function to take a function :: String -> String and make it become Nullable String -> Nullable String, but then you end up having to use that everywhere explicitly - ugh.

Steel Bank Common Lisp shows the right way to do this: include static type checking that tells you when you make a mistake but assumes that you know what you're doing when the code is ambiguous.

Of course, no one will completely agree with you, but I think your post at the very least leads to a productive discussion.

In terms of years-programming, I would be a teenager in your scale, but I did happen to start programming a little later in life than most, having a little background in math and linguistics. Thus, my abilities lean more towards the logical/database thing. But, I have been developing working code for 9 years as well as designing databases (I hate the term "modelling databases").

When it comes to application code, I completely sympathize with your frustration at the metadata addicts. In fact, I never had the patience to even become one of those temporarily (perhaps to my detriment).

But I think your analysis doesn't quite apply to relational database design. The situation is a little more complex and confused by other issues. Most people who call themselves "data modellers" have absorbed just barely enough of Database Design for Mere Mortals to be dangerous. DDMM is a decent book for beginners, but hardly begins to describe the flexibility of the relational model. Secondly, there is the whole "keeper of the tower" syndrome that happens with those who become experts at a specific product like Oracle or Sybase.

I found that after some more serious reading, such as the writings of CJ Date and Hugh Darwen, I had a completely different perspective on what is possible with databases. (Besides the obvious weighty tomes, "The Askew Wall" and other short texts by Darwen are *priceless*). In the end, I was able to produce systems with a fraction of the effort I would have spent previously. Good database design actually sped up the coding process. Expressiveness is what it's all about.

I must stress that by "good design" I don't mean to mean the endless committees and power plays that occur in many corporate settings. I mean that a) the relational model allows us to express some things much more concisely and clearly than can be managed with any sort of programming approach. But, programmers tend to ignore those capacities because they don't like logic to be out of their hands. And I sympathize; I think programmers and database designers should be one and the same.b) with a little foresight there is no need to follow the classic dual-model approach of handling the same logic in both code and database. ORM is probably the biggest culprit there.

From one point of view, code does things, and since we don'tneed static types to say what the code does, we might as well dowith out them. However, I want to do something besides say whatmy code does -- I want to say what the code shall notdo. The first two hours of writing Python are fun, butthen you start writing asserts or writing comments like "thisfunction accepts a function returning foos" -- much more verbosethan static types.

Dynamic languages are suited for little more than big shellscripts. Their prominence is only the result of the failure ofstatically typed languages to innovate (or to die a naturaldeath).

Object orientation and duck-typing are astrology -- they makeone stupid and afraid of knowledge. Type annotations arereplaced with naming conventions, mathematical concepts aresupplanted by made-up programmer talk, logic is rejected infavor of rules of thumb. To suppose that there are valid ways ofthinking outside of mathematics is heterodoxy -- it is the causeof all our problems and the root of all our sins.

There's actually a theoretical limitation to what you call < the typing system is 'wrong' >

The word you want is complete, and you can't get it. A type system is sound if well-typed programs "do not go wrong". A type system is complete if any program that "does not go wrong" is well-typed.

Turns out that you can't really create a complete and sound type system because that would solve the halting problem...

On the other hand, Haskell and Ocaml have sufficiently expressive type systems that it is more likely you who is screwing up if the program doesn't typecheck.

Then again, for some dirty hacking, like interacting with stuff others wrote a decade ago, a lesser sound type system could get out of the way. Maybe that is the real reason C(++) still enjoys such success.

On your behalf, I've submitted a JSR to the java meta-data modeling committee to implement the @Annotation@Annotation( "Dinosaur" | "Noob" ), to be used as the penultimate filtering mechanism in your IDE of choice.

After a beautiful, meticulous modeling process, with especially careful consideration to backwards compatibility, it should be ready for inclusion in the language by 2023.

It sometimes seems to me that advocates of dynamic typing talks about writing applications that actually do stuff. Advocates of static typing talks about writing API's that is easy to use.Maybe it is just me (static typing galore is my game) but I would rather use an API where the compiler can inform me that the only thing the API can handle is instances of a class (or objects with certain methods) beforehand rather than letting me prod it with unit tests until I get it right.

It would be interesting to program with a optionally typed language. My experience is with C++, Smalltalk, Javascript, Ruby, and Lisp. I find that strong types get in my way more often than lack of specifying types hinders me. I'd like to see what being able to specify types when you need to would be like.

The debates would be furious.

I've worked with programmers who insist on commenting every function with a boiler plate comment section duplicating everything that's in the signature. They often also insist that each argument to a C++ function have an "in" or "out" comment. Apparently this is also not completely obvious already.

These programmers will insist that every type be specified. Then come the "wrapper" classes, the "shims", the "proxies", and the layers upon layer of nice little security blanket classes.

BTW: I think you tried to bring too many concepts into your rant. Leaving out data modeling would have helped condense the point, but maybe that would not have stirred the pot as much.

Yep. But I think your being unfair, some problems just lend themselves better to being strongly typed, some don't. It isn't a question of age or experience, it is what you are coding that matters (although when you are young, you fail to see this). Loosely typed sets of actions cut down on code, critical computations are always best strongly typed and very fragile. Mix and match to get the system built.

I think your "programmer's evolution" story only takes into account a certain type of programmer -- perhaps the one who officially learns programming at university. Myself, I've started out with overlong programs with hundreds of global variables, barely scope, and rarely comments. I didn't program to learn programming though, but program to get games done. (Games, on that note, offer a good kind of direct visual feedback; bugs surface more quickly when say the enemy is suddenly leaving the screen within seconds instead of attacking the player ship. Compare this to how long it might take until you discover that the File -> Export as ZIP is broken when an existing ZIP is write-protected or something.)

I've learned since then. Still, today, when I try to get my heard around code downloaded from somewhere, often the first thing I do is remove all comments. Somehow the only comments I can bare are one per function header. Once comments are gone, it's easier to see the actual program complexity and flow, and you're less distracted. (Also, as we all know, comments can lie, which is bad when you're debugging.)

As with many things, perfect is the enemy of good. A pragmatic, goal-oriented compromise goes along way in getting things done.

A very interesting read. I guess you could me a novice programmer, and I have mixed feelings against static typing. It always seems to incur more effort than the writing of code that actually does some work. Certainly it helps modularise larger groups when they're working together, but then it needs to be *maintained*. It becomes extremely difficult to change, and is very difficult to debug. Personally I prefer languages like C, Perl and Python over over C++ and Java. It's unfortunate that recent languages with "other" modern concepts, always take the object-oriented and typing concepts to the extreme.

What I want is C with language supported lists, hashes, namespaces and "container" objects.

I always used to wonder about the 'years of experience' requirements on job openings. Paul Graham's assertion that it drives out general childishness (not necessarily programming childishness) just didn't seem quite right.

2.25 years of real developer/sysadmin work has given me enough experience to prove Graham wrong. It's not until you write an inscrutable, unmodifiable wad of closures that you understand moderation. Likewise with comments, static types, inheritance, magic methods or interfaces, and functions-as-data: until you've abused it, you don't know how to use it. Companies require X experience in hopes that you won't waste their money on the abuse phase.

As for relational modeling, I've observed that a strong schema tends to be defined in at least N+1 locations, where N is the number of programming languages that interface to it. Neither "Column 'foo' cannot be null" nor "Duplicate entry '1' for key 1" are acceptable error messages for end-users. You end up with code to ensure you put the right thing in the DB, and that code ends up knowing the schema.

The only way out is to write N interpreters for the database schema itself. I think they're known as "ORMs" because it sounds better than "Yes, we just wasted a bunch of time to automagically do stuff twice!" So much for the performance cult.

I want an XML database with built-in revision control. Probably to abuse.

1) If you don't start declaring complex types, the compiler will start inferring data types that you can't begin to understand because of the complexity of the type. The type of a relatively simple function could easily be longer than the code for that function. So you have to play the meta-data game if you're in it for the type system.

Uh... no? The compiler infers the most general type possible, but the only difference between that and a type you may define would be a few class constraints.

2) The type system still gets in your way. E.g. many Lisp functions accept some value or nil (aka null) and just return nil on nil input. This is really helpful when you want to propagate a non-error situation where some data should be ignored. But in Haskell/Ocaml you need to declare a Nullable type. Polymorphic data types helps here, but then you with so many "case" statements to propagate null value that you write a meta-function to take a function :: String -> String and make it become Nullable String -> Nullable String, but then you end up having to use that everywhere explicitly - ugh.

Aaaand no again. That "meta-function" by the way would be fmap and it exists and the "Nullable type" would be "Maybe" and there are a thousand idioms for using it cleanly, from using "do-notation" sugar for monads to using bind operators explicitly, to using applicatives. And the huge advantage here is no more "null pointer" errors... ever! Seriously.

That said, I'm sorta with SY on annotations, which are ridiculous. But if you can't specify reasonably well the type of a function, how can you say you have any idea what it does at all?

Never let the facts get in the way of a heartfelt rant. But if you bothered to look at some serious OCaml code (like the compiler sources, say), you'd see that there is precious little static-type meta-data there. Type inference wins the day and the sources are beautifully compact. People doing serious programming in the Hindley-Milner type system world don't suffer from excessive meta-data, but do get irritated by know-it-alls calling them noobs.

Tactically, you are very right. Strategically... I don't know. You mix two things together, strict typing in general, abuse of UML and class hierarchy,and overuse of strict typing where it is not necessary (I mean JavaScript).

Regarding class structures - it's probably hard to force people that love bureaucracy not to fill their code with Managers, Handlers, Helpers, and the like (none of these does any work, but they pass it around, like in real life). But I'm afraid this anti-bureaucratic rant has nothing to do with the issue of modeling in general.

Sapphirecat is the perfect example of what I was talking about re: database design.

The only way out is to write N interpreters for the database schema itself.

No, that's just the default assumption of someone who hasn't really looked at the possibilities. The only long-term way out of this insanity is to design software that can derive at least *some* amount of intelligence by reflecting on the database design itself. For now, we are limited because most integration between programming language and DBMS is so clunky.

Even so, designing libraries around Information_schema and DDL rather than hard-coding for specific entities take you a long way toward breaking out of the cycle. Even in a few afternoons, I was able to come up with some interesting approaches. I believe much more is possible with serious R&D. Of course, SQL itself mitigates against some of this by its screwed-up design. It's time for a better relational language.

I like using code reviews to find out where to put comments. Your code reviewer points at a section of code and says "You did it wrong." Instead of just explaining that you did it right, add a comment explaining that rightness. If your code reviewer got confused, maybe future readers will, too.

Maybe that suggests a strategy for enforcing types. After someone sends you a nastygram that your function blew up when they pass in a string, don't just write back a snarky note saying that the function of course takes a tuple of ints. Add an assert istype. If one person was confused, probably other people will be, too.

The code you've chosen to demonstrate your compressed style is a function from a recursive descent parser. That's not really a fair example, because that one piece of information tells me almost everything I need to know about your function. Without reading a single line of code (or comments) I know the function's inputs, outputs, side effects, time complexity, engineering trade-offs, related literature, and even who it favors for president in 2008. Your comments fill me in with the only information I don't yet have: what kind of constructs are you actually parsing, and what is the meaning of those strange context variables you're being passed?

So in this case, it's easy to agree with you that experienced programmers (who will be familiar with recursive descent parsers) don't need lots of comments to help them understand this function. But that only works here because you're dealing with a well-established pattern.

That's one observation to make about comments: they are always complementing some assumed background that your readers have. Maybe you have a nice design doc that you expect your readers to have read and understood. Maybe you're working within an engineering culture where certain things are assumed to be common knowledge.

I think that in many cases, storyboarding is a highly appropriate commenting style. One example is in sequences of really side-effecty code. Why is this file getting deleted here instead of two steps later? How does your sequence of events help achieve consistency, idempotence, etc. in the case that the program is interrupted or something goes wrong?

Anyway, I think terseness comes down to being a stylistic choice just as comment verbosity is. You might find terseness more correlated with experience, but here's an example of code from SQLite (written by the 20-years-out-of-college D. Richard Hipp):

Comments should exist either to (a) document something in a way that can be extracted for users of the function/class/whatever, (b) why the code is doing what it's doing, or (c) explain some code that is particularly obscure. That said, and in the absence of pulling out each significant section of functionality from a routine into its own explanatorily-named subroutine, single-sentence comments that identify the purpose of the next chunk of code make it much, much easier to find your way around a long function. The other piece of documentation that is generally not available to people is how modules relate to each other. All of these things go to maintaining the code, which is some large percentage of the life of a piece of code. People who do not document their code according to these rules either maintain their code themselves, or don't write anything that will last.

Sorry Steve, way off base and needlessly inflammatory. Seniors don't comment because it conserves energy and time; basically they're lazy (and that's ok). The urge to tell stories is deeply ingrained in all human beings. Your toddler example only proves that it's a universal need that crosses all boundaries including age. Good heavens, this post itself is a story! Does this mean you've reverted to being a 2-year old? No; it just means you're wrong about what story-telling does, or doesn't, imply.

Just another short note while the comments are open.When it comes to dynamic versus static languages, my experience has been the following;

I was dead-certain the lack of typing in JavaScript would completely kill me when I started out (some three years ago), and I don't know why it hasn't (yet), but I'm able to do moderately long (~5KLoC) projects in JavaScript only and still understand what I do, make other people understand what I do, and - most odd - no perceived increase in errors. [OK, high to begin with, what? :) ]

Also, something that one can prove true for hours, while drinking beer, proved to not be the case; No compulsory try-catch statements everywhere.

I do not know why, but I've had the exact same experience as above. I only insert try-catch'es in portions of code that I debug or when I'm trying to learn something new, otherwise I have discovered I don't need it.

And tacking another tack at explaining the higher productivity (for me) and why I'm really, really looking forward to server-side JavaScript (Jaxer, RESTful Rhino on Rails, et.c.),is that you _just_code_ . If there is a domain-specific object you're tossing between function, you just _create_ it and _populate_ it. No interface, no abstract classes, you just solve the problem.

And for some reason unknown to me, the code is still maintainable, readable and I get to the market quicker :)

Problems in the creative realm are generally unknown until the creation is finished. Don't decide your algorithms, architecture, and data models until you know everything: only then you can know how you want to build your code and data. And only then you can know what you really want to freeze. Freezing APIs, data structures etc. usually help people communicate better with their code: it's a guarantee that they don't need to look deeper but up to the API and that pays off by minimizing the other people's need to understand your code. When you're sure you want to isolate the internals of some function, module or library from its users, that's a good time to model and abstract it away. But for most of the program that's just too heavy.

Analogously, the stronger the stroke in a sketch drawing, the more it removes future possibilities. And the longer you can keep your sketch unobtrusive and open, the better result you get because you effectively postpone decisions until you know what you want to see in the final work.

I see you're parsing JavaScript code in emacs (at least, it's what the example code you posted suggests).

I've been coding a lot of JavaScript the last two years and, very recently, I've started learning about elisp.

It didn't take long before I hacked together a bunch utilities in elisp, to help me with the development.

Would it be possible that you share some (all?) of your elisp JavaScript helpers/tools with us? That would probably help me, as well as a bunch of other people; both by improving my development speed and by teaching me about elisp.

Somehow the idea of writing the meta data once and generating the rest of the application out of it is a very appealing idea to a lot of people. Once in a while a new generation pops ups and tries again.

Probably the 80/20 rules applies. You can probably generate 20% (CRUD), but the biggest chunk of a regular application still requires manual coding. Not to mention tweaking the generated part for those nice exceptions that occur in the real world.

Another great article, Steve. My favorite, though, is some of the comments. Some people just seem to overly personalize what is said.

For instance... The post mentioned that some people go overboard with schemas, modeling, etc. But, no, that doesn't mean the post is saying that those processes aren't valuable. It just means that being excessive with it can be counter-productive.

Ah, well. Obviously a good topic for discussion, though, since it seemed to hit a nerve.

Dearie me. It's obviously been a while since you've had to work with other people. Either that or your stuck in a mind set of heavy coding an you are subjecting them to a world of torment.

I agree with your analogy, there is a lot of maturity and concicness that comes naturally through time. But that doesn't mean huge unreadable code.

With maturity should come the wisdom that you(and others) must be able to change elements fast. Breaking your code into manageable readble chunks is the only way to do this. But the flip side of maturity is that it can lead to bad habits and preconcieved views.

If you regard anyone who writes considerate comments with readble formatting a speccy teenage n00b, I'd suggest you may be turning into a bit of a tired old man.

Sometimes I feel the same way about abstraction itself. Often times the machinery involved in crafting an abstraction that doesn't require heavy commenting to be legible adds the same sort of top-heavy, momentousness you attribute to model heavy code.

That you rightfully find certain kinds of metadata counter-productive, shouldn't imply that all metadata is bad.

After all, everything above machine code is metadata, right? And if we admit that high level languages are around for a purpose--that the linguistic fluff-stuff of human expression has indeed increased our capacity to engage with computers in ways before unthinkable--then we must admit that some metadata is good.

Further, we should seek new kinds of metadata that lower barriers of speaking to computers. It takes an elementary education to understand the technical *concepts* that drive today's most popular software, but it takes so much more to understand the language and components used to implement those concepts. I see that as a failing of language--an underdeveloped landscape of metadata.

I think there is a lot of truth in the main point about over-modeling, and I think the model/don't-model ought to often be a judgement call more often than it is.

However, I think the choice of static typing is rarely for the noob psychology reasons that you propose.

If you have 2 million bucks worth of hardware, doing highly optimized work in C/C++, you can't switch to python. Doing this in python would require 10x as many machines if not more, and years of rewriting.

If you want your language to dominate, it has to have performance first. It doesn't have to make sense to you, but the history of software shows that performance drives language decisions. A car that gets 8 miles to the gallon will not sell, no matter how fun it is to build.

What irritates me most about over-commenting by noobs is that when they copy and paste, they copy the comment along and change the code. So now the metadata is super bad because it not only gets in your way, but it doesn't describe the code its supposed to be metadata for. Bad comments are worse than no comments at all.

> Funny you should pick on database> data modeling. The idea behind it >is to eliminate redundancy in data > and to represent it in a form > usable by multiple applications for> a wide assortment of purposes. This> sort of modeling is supposed to > improve flexibility in how data may> be used. Compared to overcommenting> or static typing, database metadata> seems a very different animal.

In one light, it is different. In another light, it is much the same: You're imposing some extra model constraints on the developer in order to support some goal that is not the exact same as the programmer is working on.

If it's a really tricky, complex algorithm, the solution is not to remove all the whitespace. The professional, mature thing is to add comments when something non-obvious is happening. We shouldn't be writing code as a challenge to see who can understand it unaided. For example, if you see this in Perl:

%x = map { $_ => 1 } @words; @words = keys %x;

without a comment, you should fire, or at the very least yell at, whoever wrote it. (There are efficiency reasons for not writing that as well, but mainly it's a clarity issue.) To me, you seem to view complicated code as a dare: if I can understand it on my own, then maybe I meet your standards. If so, I think it's a rather childish thing to do.

Making code legible with selective commenting is a sign of craft and care, not immaturity.

I still remember the first code maintain written by another developer, and this comment:

# insert a BR line breakprint br();

Rarely have I read such an eloquent burn of the enterprise mindset. Guys whose code and data has 100% integrity but #$*@!$!-it there aren't big gapping holes in the entire approach or data structures.

Coming from 2 year old steps in Perl, then schemaless, dynamic object persistence metadata addled world of Zope (and the good ol' Zope Object Database), there is a balance that can take a long time to find between rush-and-implement and dynamic-everything-the-wazoo and "we've got code so majyk the Ruby guys get queasy". Never having done anything serious in Java I never learned interfaces. Attaching extensible metadata to classses is addicting. Of course in Python's zope.interface you describe interfaces to such an extent that you can use them to imlement a data peristent strategy or autogenerate a widget-based HTML form, so ...

Meta-data is only one part (as used by IDEs - a great part in itself) - contractual programming is the other great benefit of strongly Typed systems and will always be better because it helps contractually separate concerns of a large problem domain and therefore manage complexity. Static systems also provides for superb optimizations by a compiler.

Dangling off cliffs can occur at any age - it has more to do with the number of risks one takes - which as in all other real world cases decreases with age. Experience does prevent known risks - but well-known risks are often public knowledge and as in other real world cases - Experience can be a deterrent to taking risks.

Over-commenting and over-modeling is a valid point, however, over-anything is always bad.

An earlier blog on having "continuous" "living" systems was a good case for dynamic languages - however, even this can be managed with clusters of properly-separated static systems.

"Or you can spend years creating mountains of class hierarchies and volumes of UML in a heroic effort to tell people stories about all the great code you're going to write someday."

I recently turned down an engagement because, even though the potential client had a patented algorithm that he was going to make big bucks, he insisted that everything be modeled in UML first, that Java was the only language choice, and that data interchange must be in SOAP.

If you have a purebred and you lose the paperwork aka documentation you still would have a purebred. Agree but no one does bugfixes or enhancements on a dog. Say you want to leave your dog with your neighbor as you are going out. You will certainly give your neignbour some information about the feed and frequency etc etc.

Nowadays systems are huge and a one-man-army-veteran cannot write the whole system by himself. Even if he were able to pull this feat, I am sure it would be impossible for him to compress the code into one page. So he better put comments into what he writes.

I don't know if I would consider myself a senior programmer or junior programmer but I've never been a fan of comments in code.I believe code should be self-documenting i.e. the meaning should be obvious just by reading the code (I think I've read this somwhere as well - Elements of Java Style maybe?) So much so that I'll refactor one method into two when the method signature doesn't obviously describe the behvaiour of that method. Although this does contribute to code bloat I still think its better than writing in a comment and distracting the reader from following the code flow. I only write (short, inline) comments when the code isn't intuitive. I have found that the amount of white space I have in code has tended to reduce over time though, although I not really sure why this is.

When you compare over commenting to static typing, I think you're glossing over a fundamental difference. The first reiterates information that's right in front of you while the latter places a copy of the information in a useful place (the method signature) while the 'duplication' is several function calls away.

I appreciate a argument for dynamic typing that doesn't rail against the 'we must have tooling!' straw man.

I am a big fan of loosely typed languages. I've been programming continuously for over 40 years. I am not a dinosaur stuck in the past, I keep up with the tools, languages, concepts, and religions. With that out of the way...

Comments need to appear when there is something important that the code doesn't say. I agree that inexperienced programmers over-comment. Looking at their comments, you see it's mostly a repeat of what the code says. But sometimes you make decisions and take actions that are vital to know in the future when you're looking a t the code to maintain or adjust it. Without the background info leading up to the decision you run the risk of breaking your own code. Well, unless you have a photographic memory for info and decisions maded yin years past.

If we're talking about comments and not analyzing programming languages, dissecting the static-vs-dynamic typing philosophies, or wondering whether software engineering is possible, then I have a comment.

Code says "how". Comments say "what" and "why". Comments should be AT LEAST one level of abstraction above the code. Comments (or documentation) ARE the system. Code implements it.

I use the style I learned while working on a COBOL system designed and written by Arthur Anderson. Delightful, elegant, keeps comments out of the code too.

Should a team write for the least common denominator? ... I suspect it's a good idea to encourage people to move their stories into design documents and leave them out of the code, since a junior programmer forced to work in a compressed code base may well grow up faster.

Hey, what about us QA people? We've got to be able read your code too, you know!

Our job is to help find defects and being able to understand how things work by reading the code is very helpful. We can learn a modicum of coding skills but we're not developers; please at least meet us half way by leaving enough comments in to make the code intelligible.

Obligatory Pascal allusion: "I have made this [letter] longer, because I have not had the time to make it shorter."

Stevey wrote: "This was an especially difficult entry to write."

I'm glad, because it was painful for me to read. I had fancied myself as an architect, but I find I might actually be a teenager. Hopefully, this dose of humility is good for my soul.

However, we are not paid just to "get the job done." We are also paid to deliver maintainable software. (On good projects, anyway.)

The examples of metadata that you malign might get in the way of banging out a quick solution. But they might also be indispensable for delivering a maintainable system of large scope.

Pascal's unedited letter was quick and effective point-to-point communication. However, I'm sure he would have felt additional editing effort would have been worth it, if it had been a published book to be read by many.

The analogy I'm drawing is that metadata as you describe it (static typing, comments, annotations) become more valuable as the number of people collaborating on the project grows.

I agree to most of what you say Stevey. Just one thing: Aren't the names of classes, methods, parameters, variables, properties etc. also some kind of metadata? At least they are not needed by the machine to run the program.

For me the question remains: What makes one type of metadata more valuable than another?