Documentation is a dreadful waste of energy if applied in the wrong direction. Say, nearing perfection. This can lead to resistance to change for fear of adding to the paper mountain. (see also: TheAlmightyThud)

Coding is also a dreadful waste of energy when applied in the wrong direction too.

In the end, the one part of the system that correctly explains how the program behaves is the code itself. However, code is made for computers, not humans, -and it consequently hard to read.

If I understand the code but the computer does not, have the programmers accomplished their goals?

"Code is made for humans, not computers."

Only machine code is made for computers. All other code is made for both
computers and humans. Actually, the computer is quite happy with machine language. Logically, then, programming languages are designed for humans as their interface to the machine. Extending this motivation further, we should aim that all code written in those languages be as readable to humans as possible.

Actually, even the computer would be 'unhappy' with machine language if it was trying to talk to other computers - even a society of robots would build communications standards, semantic compression, and higher-level languages as they attempt to optimize their communications (so they only need to say something OnceAndOnlyOnce, don't need to carry around a billion many-to-many library of encodecs and decodecs and compilers and disassemblers, etc.).

Absolutely, if a language is so ridiculous that only a computer can write it; what good is that? Except of course if you wanted a computer to write it. If a human must control the machine, then the human should do it by his own rules (his language - Java / Perl etc.) not the computer's (machine code). Then, once he has 'tamed' the computer (written his code) he doesn't mind what the computer reads as long as he can read what he needs to read (ie: the code has been compiled and the computer is understanding what is simplest for it. The human has written what was simplest for it and still has the original for later tweaking - the source.) -- MatthewTheobalds

The philosophy behind DonKnuth's LiterateProgramming methodology is that code should be written primarily to communicate its purpose to humans. LiterateProgramming uses automated tools to convert code written in a human-optimized form into a compilable form. It takes advantage of tools to aid the SelfDocumentingCode methodology. (Knuth attributes his ability to complete TexTheProgram to his use of LiterateProgramming).
I am currently dealing with some code that essentially implements a StateMachine, but in a "home-grown" manner, and has no file or document explaining the state machine at all. Now, I know that there are STM generators that can be used which take a text file description and generate C++ classes needing only a fleshing out, but if one of those is not used, and the state-transition code is not exactly obvious, regaining the STM from the code is really time-consuming. I would tend to state that SelfDocumentingCode is not yet a reality until we have tools that can generate the code for the DesignPatterns we use, or at least encapsulate the constraints in some source form other than code. -- PeteHardie

Applause. It's not just state tables, either. I have long been a proponent of the separation of Church and State -- er, sorry, of code and documents. Code is code and documents are documents. Perhaps because I have seen the not-so-great XP experiment at this client fail pretty miserably I am biased. A lot of the XP tenets of test before coding, etc., are great stand alone implementation techniques. The idea of putting the documentation into the code as code is not such a hot idea. I am pretty sure the FDA would not go for it, nor would any other alphabet soup governing bodies.

Regardless of what Da Guvmint wants me to do, I prefer to have some sort of description of my design that can be translated into code by anybody who looks at it. The code I create may not necessarily be exactly the code you create, but it will do the same job. We both worked off of the same sheet of music. -- MartySchrader

Code is indeed code and documents are documents. I couldn't agree more. Mixing code and documentation in the same file, distracts me too much from the original purpose of that file: being a container for the source code. I don't like it when I have to hunt for a method between huge blocks of javadoc-like documentation. Neither do I appreciate the valuable screen real estate consumed by those documentation blocks while I'm editing code. I can see the advantage to make it easy for the developer to find the piece of documentation relating to the code he's writing, but surely that can be done in other ways than putting the two together in the same file? Not to mention the issue of documentation changes triggering unnecessary recompilation because they changed the source files... And to add insult to injury, javadoc-like systems are often used as a poor substitute for _real_ documentation. A simple list of classes and a description of their methods is not a good way to explain the structure of a system. -- IvesAerts

The real problem is that modern languages don't yet show you the relationships between classes. Or even better, instances. (Or even better, prototypes--wait, we don't have those yet except in JavaScript.) There was an article a couple years ago in DrDobbsJournal (cf. anyone?) about schema representations in C++. The author basically suggested that instead of putting a collection inside a class, one puts it outside; further, put all the collection definitions in one place. In effect, one would explicitly state somewhere the m-n relationships used within system. Thus, the high level view was just as apparent as the low-level view. The author claimed this was more readable. He was probably right. However, you lose that nice encapsulation barrier thing that we never had in the first place. -- SunirShah

That roughly describes the diagram facilities in Microsoft's SQL Server Enterprise Manager: a high-level view of the m-n relationships of the system. These diagrams are definitely a point in favor of static typing (or at least declarative type constraints). But I still need supporting documentation to explain what those types mean outside the code and why the relationships are set up the way they are.
"SelfDocumentingCode is not yet a reality until we have tools that can generate the code for the DesignPatterns we use, or at least encapsulate the constraints in some source form other than code."

So what you end up with is a document and some code and no way to be sure that one of them actually corresponds to the other. That doesn't sound like a win either - face it, anyone who'd write a state machine you can't read probably wouldn't have kept the documentation current anyway. I think you're asking for more expressive programming languages, really. -- DanBarlow
I believe the whole documentation disucssion comes down to a simple evaluation. Does the cost of generating and maintaining documention out weigh the cost of tracking down a piece of information when needed? Yes, it would be nice to have some precise bit of written text to save me the effort of tracking through the source code to figure something out, but how much extraneous documention also needs to be developed to ensure the small piece I require gets written? -- WayneMack
This issue is a form of OnceAndOnlyOnce, carried out on a semantic level. That is, if you have to maintain both code and documentation, you're trying to write and maintain the program in two different forms, and you're creating a redundancy that will slow you down. (See RedundancyIsInertia) So maintain the program in only one form, learn to read the program in one form, and learn to write a readable program in one form. Of course, you always need the source code, which means good bye documentation.

While doing MIL-SPEC tech_docs in avionics R&D I conjured up a "write once" paradigm, convincing the engineers (both software and hardware) that I would bring together the entire document set (a formally described deliverable) without distracting them further IFF they wrote all salient information out at least once ... only once was, of course, the carrot, and undocumented configuration items (I could shift blame onto them easily enough) was the stick. (N.B.: I'm starting to see the tools that would allow this to be done very nicely these days ... anyone have a nasty big harware project that needs docs? *grin*) -- BenTremblay

By this analysis one can still document that which can't be read from the source. One must still decided if even this much documentation is worth keeping.
On MethodsShouldBePublic, someone wrote:

Another point to consider is that often breaking up code into smaller methods -- private, public, protected, or whatever -- helps to clarify the intentions of the code. I find that if I write code just for myself that isn't broken up, and then I come back to look at it a few months later, I don't have a very clear sense of what the code does. Method names help quite a bit -- instead of squinting at lines 154 through 175, and asking myself what it does, I can read the method name "calculatePercentages" and see that it calculates percentages. WellFactoredCodeIsSelfDocumenting?.

But method names don't tell you why. They also don't tell you about how something should be used (the object state machine). And most often they are just hints at functionality. The example that started this page is CalculatePercentages. With an understanding of the domain this method name doesn't make any sense. And it doesn't tell you the pre- or post-conditions. --AnonymousDonor

What domain are you referring to? I was using a rhetorical example from an unspecified domain. It's not hard to imagine a domain in which a method name "calculatePercentages" is fairly clear about intent.

A lot of this probably comes down to how much documentation you consider to be enough documentation. For example, I think a big emphasis on pre- or post-conditions smells a lot like MassiveFunctionHeaders. There are plenty of useful functions that don't need pre- or post-conditions noted, because those that do exist are fairly obvious anyway. There's that sweet spot of documentation where you get enough to give people what they need, but not too much to irritate the reader or the writer. I think WellFactoredCode goes a really long way towards getting to that sweet spot.But method names don't tell you why.

Method names don't need to tell you why or give you the context. The "why" and the context is provided by the caller of the method. The method doesn't know why it is to be called, it only knows what it does.

Then how does the caller know why and when it should be calling the method? What if there is no caller yet, then there is no purpose? What if callers call the method for different reasons? How does one divine intent? Methods have a purpose or they wouldn't exist. Not being able to learn the purpose by looking at the method is obfuscation. This happens on a large code base, even if the code is "good." -- AnonymousDonor

I am not sure of your development environment, but when I write functions, the calling function calls a method because it needs to. It is simply not feasible for the called method to know why it is called. The called method can only provide how it is implemented. In more concrete terms, I know what "printf()" does, but I have no idea why a particular section of code may choose to call it. If your calling code is well factored (and you use better names than printf), it should be obvious why it is calling the methods it does. Trying to maintain the reverse links from the called method to the calling methods is simply not worthwhile. When you need to find that information, just go ahead and use a grep/find utility. That has been standard practice in maintenance of large code bases for years.

Need defines context/environment/requirements. How would you learn printf is available? How would you learn what arguments printf takes and all the weird things you should know about printf? Why is the calling code using printf? Why isn't the code using cout or a gui or something else? What would happen if it used something else? Is it relying on buffering? Does it know what implications printf has on performance? The context of the programmer's mind can not be deduced or guessed. It must be stated by the programmer for future generations. -- AnonymousDonor

Why should one care what was in the original programmer's mind? If the function is performing satisfactorily, then there is no need to change it. If there is a need to change the function, one will make the change regardless of what the original programmer thought. Any operation can be implemented in a multitude of ways and it is immaterial why someone selected one particular way.it doesn't tell you the pre- or post-conditions

If the code doesn't tell you what the pre- or post-conditions are, nothing else will. Self-documenting means just that. You have to read the code to understand it.

Just like you need to read my DNA to know why i consume meat? -- AnonymousDonor

No, I expect every individual to have a comment tattooed to his face describing his dietary preferences. These comments should be updated should the individual decide to become vegetarian, or if his doctor advises him to give up bacon and sausage. Every individual should be required to have this information tattooed on his face, because my time is too important to have to stop and ask him should I ever need the information.

I like your humor. To state it directly: code is documenting only a limited scope of information, as any document do. Expecting everything from the code is not reasonable. There can be document which is not part of the code, for example, marketing material of the software.
There's a difference between what the code does and what it's intended to do. Usually you worry about when the code doesn't do what it's intended to do: We call those bugs. But conversely, sometimes you find that the code has properties you hadn't planned on, in ways that are not feature-creep. Maybe your code can be reused in a way that you hadn't anticipated, or maybe it's so elegant that it gracefully handles an odd input before you realized that it would ever receive that input.

The point is that the two -- what we want code to do, and what it actually does -- are always a little different. And they keep moving as the system grows and changes. Documentation is intended more to address the first, but certain techniques emphasized in ExtremeProgramming address the second. Both take time. But if you spend all your time on documentation you still have to take human time to ensure that the code does what the documentation says it does. Two moving targets, with constant human work to keep both in sync.

But what if you spent all your time on the code? If you decided to RefactorMercilessly, most of the code would be easy enough to figure out at a cursory glance. And then if you wrapped all your code in UnitTests, you could use them to learn about the code in the few cases where the code is a little more obtuse.

So perhaps this page should be retitled WellFactoringPlusUnitTestsEqualsSelfDocumentingCode. Well, not really, 'cause that's pretty ugly.

I don't find unit test to be documentation at all. They are just more code to understand with nothing to help understand it. And please, can we actually see some of this self documenting code for any kind of meaningful block of code? Most of us have seen a lot of code and know self documentation is an very ambitious goal. As i have never seen such code, and none has ever been produced for general viewing, i doubt it can actually be done. If you think documentation and code are hard to keep in sync, how much harder is it for developers to make their code and unit tests express everything meaningful about a system? This is a far more difficult task. Sure, you can counter that all these other issues don't need expressing, but i think they do because they are what you need to know. For everyone coming after you what the code does is almost less important than what it is supposed to do and why you chose to do in such a way. What tradeoffs were involved in your system? Why didn't you do it another way? What feature are you implementing that is not obvious from the code? Are you using a uint32 for the wrap around characteristics? Without a comment how will i know this? How will i know the consideration of wrap around is even important? What will happen if wrap around isn't handled correctly? Is this uint32 to int conversion correct? Why do you think it is? These are the simplest of examples, but it happens all the time at every level. For example, all the issues surrounding a simple printf call seem impossible to express in code. Expecting developers to "get it" from the overall impression of the code seems impossible.
-- AnonymousDonor

I think that a lot of the discussion here is based upon mistaken assumptions that modern programming languages make; there's no way to capture a lot of important information in the source code. The most important omitted thing that comes to mind is a listing of dependencies between modules, or a "build process" for static programs, but there are lots of high-level pieces of information that modern programming languages don't address. Another pretty major one is requirement dependencies; when a customer changes a requirement, why can't the compiler tell us what code probably needs to change? It sucks to require an expert programmer who is familiar with the whole codebase to decide how massive a particular requirement change would need to be -- that web ought to be part of the code. Currently we can only encapsulate it (poorly) as comments. (Note that both things I mentioned could be rather simply represented using a graph. Still, nobody does it.)

Source code which only describes the machine-executable parts of a program seems shortsighted. Many external factors may affect the code, and most of those can be quite easily represented in a machine readable, although possibly not executable, form. We (as an industry, or field) have pretty much started to agree that API documentation (javadoc & friends) and testing (UnitTests) should be part of the source code, and at least some technical documentation (comments) should be as well. Why not end-user documentation? Technical specifications? Requirements documents? It's not like files and directories can't handle this... but CASE-tools vendors really like holding their customers hostage in windowing GUIs and nasty licensing agreements which forbid reverse engineering. Maybe one day an open source tool will address this. TheSourceCodeShouldBeTheDesign?. (This is my first Wiki posting. Be gentle.)

"We...[agree] that API documentation (javadoc & friends)...should be part of the source code": Yes and no. Yes, in that the master copy of the documentation should be close to the code, so that if one needs to change the other isn't forgotten (hopefully). No, in that one doesn't have to look at the source code files to get the documentation (e.g., JavaDoc extracts the documentation comments and some code structure and generates separate documentation files).

The source code might be the detailed design, but it is not the high-level design. (Well, it's not the high-level design without all the lower-level details that obscure the high-level design).

Flashback: 1968-9, trying to work out how to hack a compiler to add a new device. Tried drawing a flowchart of a core dump; 20 double sided pages. Experienced guy walks in, says, "Why don't you ask the company for the documentation?" So I phone up the vendor and they send me...20 pages of code written in the language that they were compiling. Can't say it helped me very much. A very good compiler however.

Questions: Can our current forms of source code be improved?

Of course. And the quest to do so motivates much of the history of programming language design. My take on the article mentioned here is that it has always been fundamentally true, but the low levels of abstraction in programming languages make the source code a poor design document. As languages improve, our code can be closer to a direct expression of the design. More "essence" and less "accident", to use FredBrooks' terms. --GlennVanderburg

One of the primary goals of theEiffelLanguageis to capture both the design and the implementation together in one place, using a simple and clear notation. Alas, the Eiffel notation is so simple and clear that it's considered a fool's language by those whose careers depend on hacking out code in complex and arcane programming languages. Eiffel can only be appreciated by designers. Unfortunately, designers have been convinced that graphical representations (UML) are the only true way to express a design, and although the 'L' in UML stands for 'language', it is not a language which a machine can readily process to extract the essence and the elements of the design.

["[UML] is not a language which a machine can readily process...": Why not? The graphical language (the diagrams) is formally defined in terms of a model. Almost every single graphical variation means something.]

Until they very recently approved (did they really ?) UML 2, UML was most definitely not defined in terms of a formal model. UML had almost a decaded to damage its reputation, and entrench its snub architects core user base into the wrong attitude. It is for good reasons that DavidParnas pronounced himself "UndefinedModelingLanguages? are always a bad idea". Even if they manage to define something in the end, the jury is still out. There neeeds to be some significant scientific peer review of the model to make sure that the definition is tight, and some serious practical validation to convince ourselves that the newly defined model is a good model. It took more than a decade to refine the relational model.

After reading Jack Reeves' article, I believe he is essentially correct. It reminds me of a quote from the book, StructureAndInterpretationOfComputerPrograms by Abelson and Sussman, "Programs must be written for people to read, and only incidentally for machines to execute."

To avoid the problems of CASE tools where the generated code is often incomplete and the developer-corrected code is incompatible with the tool, perhaps what we need to do is to place the descriptive drawings (e.g., UML) directly in the compilable code artifacts.

Or vice versa, perhaps in the style of the TogetherTools. One problem here is that too much of the UML is C++ with lines drawn around it.

I believe (and practice) that the correct approach is to consider the code generator to be part of your source code. The design is the combination of the diagrams plus their translator(s) plus any other source code generators plus any other source code. Each fact should be expressed within the design OnceAndOnlyOnce, but the form of that fact could be as a diagram, as text (table), as translator code or as traditional source code (of course, this approach is applied recursively to the translator programs -- hence the ShlaerMellorMethod uses the term RecursiveDesign for this approach). See also, MetaRefactoring. -- DaveWhippmoved from DocumentationBeyondTheSourceCode

See RepresentingRelationships. Source code can either be an easy to understand design, or a hard to understand design. There are lots of techniques that make it easier to understand, at least if you understand the conventions.
moved from TheSourceCodeIsTheDesign

I have worked on projects that do have over 1,000 lines of code with the only documentation being the source code. I guess I fall in the category of those who believe that source code is the only useful documentation typically produced in software development.
"I don't find unit test to be documentation at all. They are just more code to understand with nothing to help understand it. -- AnonymousDonor"

Many people benefit from reading source code when first trying to understand a framework. The sorts of messages sent and objects created provide a nice cue. "I can make a Song containing Tracks which each contain Notes -- I can load Notes from a string, Tracks from a file, and then make a Song with those and -- oh, here's code that sends #loadFromFile: to a Song. Easy." One can't always be troubled to read through a lot of documentation on methods of an object. Everyday use of the object shows the sort of use that's usually desired -- someone just interested in playing a MIDI file won't want to necessarily load each note and track individually. Source code of the object itself can't always communicate all the intent of the object, even with IntentionRevealingSelectors and the like. However, UnitTests, as many have noted, immediately provide a user of the object. Those interested in learning the contract the object fulfills with its clients in the most common and therefore most important cases can just glance at the tests. For example, in Smalltalk (pardon me if it's wrong; I don't know Smalltalk nearly well enough):

This isn't a great example, but I think it gets across the idea that a UnitTest is code that actually exercises the object being tested as it's likely to be used. The test illustrates creating a Song via Tracks or from a file. This approach is quite nice, as comments are at best redundant and at worst meaningless. Can you imagine "This is for creating a Song out of Tracks..." and then "This tests creating a Song from a file..." or "This is the file we're using."? It's silly.
UnitTests, as I understand them, are written by the programmer working on the object. Therefore, whatever covenant between object and user-of-object the programmer has in his or her head is what's reflected in the UnitTest, and in the object itself.

Explicit documentation has its place, certainly-- usually, however, it notes exceptions to the contract-- "This method breaks when strings of more than 256 characters are passed it"--or its relation to other methods--"-[NSString isEqualToString:] tests for equality with a given string."(obvious)"It is usually faster than using -isEqualTo:"(not-so-obvious). In the latter case, documentation might be necessary. But why not put something into -[NSString isEqualTo:] like "if the object we're checking for equality is a string(or can respond to 'isEqualToString:'), return whatever 'isEqualToString:theObject' returns" and remove or inline or downplay isEqualToString:? That if-then is easily expressed in many popular programming languages and many unpopular ones as well.

Furthermore, UnitTests will often serve that explicit need, where it exists, with checks for boundary cases and the like.

UnitTests are guaranteed to be current with the code and have the same benefit over reading 'traditional' documentation that reading source code does.

My train of thought has jumped the tracks; anyone else have something to add? --JoeOsborn"You do not need to 'understand' code and usually cannot."

This is a rather shocking thought, but once you accept it, most of the justification for documentation fades away.

Software of a size requiring multiple programmers or an extended development period is simply too complex to be understood by an individual. Understanding is limited to very small pieces (a whole method at best) and then for only a short period of time. SelfDocumentingCode supports this definition by concentrating on understanding at the method level. A method can be understood if it is made short enough and written in a clear language.

Give up on understanding the code as a whole. It simply cannot be done and you will merely get caught in the trap of "improving" the documentation it you try. Keep your methods short and clear and they will be understood as needed.

Isn't this what modular design and encapsulation is for? You encapsulate complexity in modules so that you can shorthand its function in a larger module. -- PeteHardie

Encapsulation hides the complexity of a lower layer so that one does not have to understand it at a higher layer. At the top levels, one does not try to understand what is happening at the bottom levels.
I like the above, and was maybe subconsciously trying to get at it. Implementation doesn't matter. Therefore, UnitTests show how the interface works, in case you want to use a method and the interface isn't clear enough. I didn't even think about understanding a whole program, I was only thinking documentation for frameworks and similar -- where one does need to know basic relations between objects. But for programs, the only non-code definition should be for users. As for the rest, see the above, or treat the objects as if they were in a framework -- forget how they're implemented, forget everything about them, and if you want to use them, check out the UnitTests to see how their methods are used, or their interfaces if those are clear. -- JoeOsborn
Quote taken from LiterateProgramming: "Note that 'documentation' means a description of the implementation, not a user manual." [Emphasis mine -- MS]

This is exactly the problem that I see with the idea of so-called "self documenting code." The code describes itself quite well, but never captures the original intent of the designer, engineer, what have you. It is very possible for a well written, properly "self-documented" bit of code to implement something that has nothing to do with the intent of the design.

Please expand upon your concern. Then as a follow on, how does documentation resolve the problem?

I am concerned that an implementation may not match a requirement or that an implementation may be incomplete or carry excess baggage. Having a written instruction makes it kind of hard for the task implementer to screw it up. After the task is supposed to be accomplished the results can be compared to the original instruction by a third party to see if the job was done right.

Why is the documentation considered correct if the documentation does not agree with the implementation?

The document describes how the product is supposed to work, so if it doesn't work that way the code is in error. Who determines if the product works correctly? Anybody who can read the document.

But why is the documentation considered correct? The above merely asserts the documentation is correct.

Because something has to be correct, and the document is the one device that everybody can read and agree on the contents of. The document is a clear definition of the final product's intent for external parties to see, if required. The document has to be correct because it is the central focus of definition for the entire project.

Yes, something has to be correct. But this something is more likely to be the code, instead of the document. Programmers are forced by compilers, runtimes, QA to make the code correct. And few people force the document to be so much correct. There is some work called debugging, where we do tweaks to the code implementation and not always update the documents. It's especially true when the tweaks are small, but many small things added up can be big.

No. The fact that the code executes without failure to some requirement is not the same as the code meeting the original requirement. That's why you need a document that clearly states what the requirement is -- so that you can compare what is required to what actually happens. The code can "work" all day long and still be wrong.
Those who do not document history are doomed to repeat it.

LiterateProgramming shows just how much you can say about even simple code (often whole books). I think the problem here is that the level of documentation required depends on a number of factors, including its intended audience, its expected lifetime, its level of reuse and its expected rate of change. Interfaces need very different documentation to implementations. Presuming it's just other programmers in the same team, my preferred documentation style is some top-down documentation in separate files named in the source, together with comments scattered throughout explaining any aspects of what you intended that aren't clear from the source alone.

IntentionalProgramming should probably get a mention here too, though it's a bit different. Incidentally, last time I looked, MicrosoftResearch seemed to by trying to suppress it from their search engine. A search for "Simonyi" turned up a document titled "Intentional Programming" where a search for "Intentional Programming" had failed...but that may be attributing to malice what can be explained by incompetence.
Perhaps, the problem is, we think in a "file"-structure. Code and Documentation in one file is too much overhead to read. Code only is too little to understand. We think in file-structures, the file is only a medium to "save" the information for the compiler.

We don't have to edit code in files. A language should not only define synax, it should define the abstract GUI, in whitch we edit the code, it should define how to save the information, documentation, state of the code.

Is this the answer for the problem?

-- Mike Hummel

I think so. TomStambaugh says pretty much the same thing at GoodChangeLogEntry.
1 method with 1000 lines of code or 500 methods with 2 lines of code each... Now which is easier to decipher?