Thursday, 22 July 2010

Coding Style As A Failure Of Language Design

Variance in coding style is a huge problem. Reading code where the style varies all over the place is painful. Moving code from one place to another and having to restyle it is awful. Constantly adjusting the style in which you're writing code to conform to the local style of the project, module, file, function or line you're modifying is miring.

Therefore projects adopt style rules to encourage and enforce a uniform style for the project's code. However, these rules still have to be learned, and adherence to them checked and corrected, usually by humans. This takes a lot of time and effort, and imperfect enforcement means code style consistency gradually decays over time. And even if it were not so, code moving between projects looks out of place because style rules are rarely identical between projects --- unless you reformat it all, in which case you damage the relationship with the original code.

I see this as a failure of language design. Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel the need to do it. Syntax would be uniform within and across projects, and developers would not need to learn multiple variants of the same language. More syntactic restrictions would be checked and enforced by the compiler, reducing the need for human (or even tool-assisted) review. IDE assistance could be more precise.

Two major counter-arguments arise. People will argue that coding style is a personal preference and therefore diversity should be allowed. This is true if you only participate in particularly small projects, but if you work in a large project then --- unless you are exceptionally fortunate --- you will have to deal with a coding style that is not your preference, no matter what. (Maciej Stachowiak once said that willingness to subjugate one's personal preferences to a project's preferences is a useful barometer of character, and I agree!)

A more interesting counter-argument is that many coding style rules aren't sufficiently formalized so as to be machine-checkable, and might even be very difficult to formalize at all. This is true; for example, line-breaking rules or variable naming rules might be very difficult to formalize. So I relax my thesis to claim that at least those rules which can be formalized should be baked into the language.

(Figuring out exactly which rules can be formalized, and exploring alternative syntax designs that maximize automatic style checkability while still being nice syntax, sound like fun research! Programming language syntax is one of those areas that I think has been greatly under-researched, especially from the HCI point of view.)

27 comments:

The obvious response is "Way to go, Python", except that even Python's syntax is flexible enough to beget style guides, and not just for variable-naming reasons. The official style guide, PEP 8, is fairly comprehensive but doesn't touch on things *I* consider important to code readability, like how to indent expressions wrapped onto multiple lines.

> A more interesting counter-argument is that many coding style> rules aren't sufficiently formalized so as to be machine-> checkable, and might even be very difficult to formalize at all.I think that something like StyleCop (stylecop.codeplex.com) shows that for some languages - even C heritage languages - this can be done.I can't find an HTML formatted StyleCop rules list - only a .chm list :-( But trust me the list is extensive.

The fact of project coding styles has always been a reason that I've not been able to understand why people get so worked up about Python's significant indentation.Pretty much any formalized coding style I've worked with requires that you lay out your code pretty much exactly how Python tells you to, without mixing tabs and spaces randomly. So it's requiring you to write code how you're going to have to write it anyway, and you're near-guaranteed to be able to follow the style of any code you read by other people... and yet it's one of the first complaints you hear about the language.

...g nesting got one additional 8-character hard tab, and the result was ugly. Depending on the overall nesting, 4, 2 or even 1 space per indent level will be better.And sorry for hitting tab-enter too early, so an unfinished comment (above) posted itself.

Coding style is not only a question of personal preference. If it were only that, there would be no question that the project leader, or the initial project leader, should impose his (her) preference on the whole project, and in fact, some leaders of well-known projects, for instance Linus Torvalds and Bram Moolenaar, have done just that.But coding style may sometimes vary according to the underlying "logic style" of one or another file. I've seen HTML (in spam mail, usually ;-) ) where each level of tag nesting got one additional 8-column hard tab, and the result was ugly. Depending on the structure of the page, 4, 2, or even 1 space per indent level is sometimes better.

It would be interesting to see how compiler forced style rules would work out in practice. I'm cautiously optimistic that it would work out to be a good idea. It would make reviews easier in some places.Maybe someone working on a new experimental language could try the idea out :)

I'm very big on tabs over spaces as a semantic rather than presentational representation of indentation (yay, Go), but I would have much preferred if Python just chose an indent token and stuck with it, even if it was something silly like '4 spaces'.What about the gofmt-style solution? Seems like a good practical approach to the problem.

Sorry, but I have to say you are wrong here. There are good reasons to let people have choice in style depending on what one is doing. Hard-coding one style into a language will only cause even more frustration for beginner programmers. "With code, you can do anything you want, but it has to be exactly formatted like this, and no, there is no physical reason it has to be this way." would be a creative buzz-killer to anybody just starting out.

If you find the answer or just come across some interesting research be sure to let us know.I guess really the problem is that it's all just plain text until it hits the compiler/interpreter.Perhaps something like working on a editor formatted view (to your preference) over a style independent source (something like an AST maybe). Then perhaps the only rules could be naming conventions, or at least not things regarding whitespace.But people like being able to use whatever editor they like (vi, emacs, notepad) to code in.We've already got various code reformatting tools in IDE's or standalone, perhaps they could be improved and integrated with a compiler like clang as a project policy and allow other tools like diff, merge to work at a style independent level.How do you manage exceptions to the rules, e.g. per line character limit? Set a soft limit of say 80 chars and hard limit of 100 the prevent things being awkwardly reformatted for the sake of a couple of characters.If it's a new language and the source can be stored in a machine friendly text format then surely thing get a lot simpler, but would anyone use it?

I've been thinking about a solution to this problem where code isn't just plain text anymore - more of a structured format in XML perhaps. That way, the IDE in conjunction with developer specific prefs, can lay it out however you like. Maybe even variable names get a level of indirection so you don't have to worry about mName/_Name or sztcnString/str issues too. I suspect the downsides would far outweigh the benefits for most projects though.

Enough people hate Python because of how it already forces them to write according to the official style. If it started throwing syntax errors if you put the wrong amount of whitespace around an operator, people would go ballistic. :)But you'd think this would be a good option to have, at least, no? Kind of like the old -tt option, but much more comprehensive. I wonder if anyone's suggested it before. Some syntax like "import __strict__", perhaps.

@David Python's forced-indentation doesn't bother me, but the lack of visible end-of-block delimiters does. It makes it hard to see what's going on when multiple blocks end simultaneously. Which encourages the 4-space indent to make it at least somewhat easier to see, which eats up more of my 80-char limit than I'd like...

Reminds me of this:"an API is not about programming, data structures, or algorithms—an API is a user interface, just as much as a GUI. The user at the using end of the API is a programmer—that is, a human being. Even though we tend to think of APIs as machine interfaces, they are not: they are human-machine interfaces."http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

Justin: totally agree.Aryeh: yes, I'm sure it would rub some people the wrong way. But those people would probably not get along with project style guides either. See my quote from Maciej. I think you'd have a better chance of pulling this off in a new language since with existing languages people are used to coding any way they want.Callum, Paul: "structured editing", where the source code is no longer plain text, was tried extensively in the 70s and 80s (see for example CMU's Gandalf project). Those efforts failed. I think there are very good reasons for preferring plain text: you can save "bad" intermediate states, and you can continue using existing editors, and version control systems, and bazillions of other tools that process text.voracity: gofmt sounds like a step in the right direction but I think a slightly harder line is warranted. If you require that gofmt be run before checking in and before presenting code for review, why not before you run the compiler as well?Havvy: personally I'd rather focus my creativity on what the program does rather than on where I put my curly brackets. By analogy, I'm much more productive writing plain text than in a WYSIWYG editor like Word, since in Word there's the temptation to fiddle with formatting instead of just write.

I'm not sure how well this would work in practice.I have a feeling it might be more realistic to make the feature optional. You could either have a an option in the compiler or some kind of strict keyword that would enforce a certain style for projects that want it, and still let other people choose their own styles if they wanted to for their own projects.

Havvy:I recall that, just a bit over a decade ago, some Mac based Pascal environment (sorry, I don't remember the name) had the auto-formatting pretty well done. So it's probably possible, at least, even with an introductory language - it relieves the new programmer the need to figure out coding style.--The problem with forcing coding style is that sometimes you need to bend the rules for clarity; for example, most of the time various projects tend to try to go for a certain maximum line width (80, 120, whatever), but often it also comes with a caveat that just a bit over is fine too.

Any syntactic check will miss stylistic abuses, and furthermore is likely to prevent code that is actually written with good style. For example, how should functions be named? LongAndVeryDescriptive(), and short and concise? I think that partly depends on the purpose and usage of the function, and seems infeasible to enforce in the language design. Similarly, should a function have multiple return statements or a single return? The right choice is often dependent on the nature of the function and the amount of cleanup/etc. that needs to be done before returning. Requiring a single return would be perfectly possible, but not very useful, either. For yet another example, how long should functions be? Depending on the nature of the problem, long function bodies might well be the clearest way to express the solution, even if we might prefer small functions in general.It seems clear to me that good programming style is simply a matter of taste and experience, and can't be encoded in a syntactic constraint.

Neil, the issues you talk about are higher level than most of the rules found in style guides. I agree those are unlikely to be formalizable in a useful way.But there are lots of rules in project style guides, for example about indentation, or brace placement, that are.

Do you happen to have proper Mozilla code style formatter settings for Eclipse? (I find that the "Mozilla" setting bundled with CDT don't actually match the Mozilla style. Well, to the extent the is "the" style.)

Languages that use ";" and meaningless whitespace have a redundant syntax. You're supposed to indent with whitespace to express yourself to others, but then there's another syntactic layer for the compiler, and when the whitespace and the ";" gets out of sync it creates confusing code. It's ridiculous that we have languages with redundant syntax.Python is the closest we've got so far, but it needs to go further. The ":" at the end of an if/for should be removed... it's implied by the indentation on the following line. As others have said they need to choose from tabs or spaces too. Tabs feel more semantically correct but, in my experience, spaces don't get as mangled in large projects.

@Callum, Paul: More structured and abstracted program notation is definitely a good step toward the solution.@ROC: you say that structured editing is a failed experiment that's no longer used; but it's just a matter of perspective. Almost everyone uses syntax highlighting and that's just a limited form of structured editing. IDE's do various source code transforms& refactorings for your, including partial layouting - that's structured editing that again almost every IDE supports and most people use. Auto-complete or intellisense are simply variants of structured editing that most people appreciate. Code folding is common in IDE's and definitely in XML editing tools - that's structured editing.These things just need to be taken further; having a comprehensible language serialization (i.e. source code) is important, but that doesn't mean the view needs to be the fixed width terminal-style character matrix it is now.It's not an all-or-nothing feature; and it's pretty clear which direction the arrow of time is pointing - and a good thing too. How much work would it be to have a large project where each contributor views and edits with his own spacing, indent, newline & bracketing style yet behind the scenes a code-formatter ensures a standardized format is actually exchanged via source control? That's possible with some hassle today...

There's a huge difference between what modern IDEs do and what the old structured editors did: in the modern IDEs, the ground truth is text files. In spite of the assistance the IDE gives you, you can enter whatever text you want in the file, you can save that text, and even check it in, whether the IDE thinks it's valid or not. The IDE may keep around an AST or other data structures, but at the end of the day the AST is constructed from the text file, not the other way around. That is fundamentally important.