CamelCase vs underscores: Scientific showdown

In the odd case that you are an experienced programmer who doesn’t have a preference over using camel case or underscores for identifiers, try making up your mind now. Try choosing independently of (language) convention, habit or type of the identifiers. If you are a Lisper and like dashes, just vote for your next favorite.

Did you vote? Good! Now it’s my turn to do some work, as I will try to take you through a semi-scientific explanation to prove which formatting is best suited for programming.

I wouldn’t have written this post, if I hadn’t read Koen’s Tao of Coding. As an ex-colleague he converted me to the underscores camp. The trigger to write this post was when reading a reply on a formatting discussion.

“honestly the code is easier to read” Opinion or fact?

It inspired me to look for scientific resources. Surely, studies must have been done right? As it turns out, not too many, but I found one. But first, in case you never had this discussion, … the usual opinions, and rebuttals. If you are looking for the facts, skip to round 3.

Camel case is used by convention in a lot of major languages and libraries. (You weren’t allowed to use this argument when voting!)

Round 2: Rebuttals

Anti underscores

Underscores are ugly, camel case is more elegant.

Anti CamelCase

Underscores aren’t that hard to type. Seriously, as a programmer it is your duty to learn blind typing with all ten fingers. Learn qwerty, and save yourself the trouble of having to use the exotic AltGr button.

Use whitespaces and an IDE with color coding to easily see the difference between operators and identifiers.

When reading the abstract of the research paper, it seems science is on the camel case side.

Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style.

Existing research

Natural language research in psychology found that replacing spaces with Latin letters, Greek letters or digits had a negative impact on reading. However, shaded boxes (similar to underscores) have essentially no effect on reading times or on recognition of individual words. Removing spaces altogether slows down reading 10-20%.

Experiment setup

Empirical study of 135 programmers and non-programmers. Subjects have to correctly identify a matching phrase (maximum of 3 words long) out of 4 similar phrases. The important variables researched:

Correctness: whether the subject identified the correct phrase.

Find time: time taken to identify the phrase.

Training: how being a programmer affects the performance.

Results

Camel casing has a larger probability of correctness than underscores. (odds are 51.5% higher)

On average, camel case took 0.42 seconds longer, which is 13.5% longer.

Training has no statistically significant impact on how style influences correctness.

Those with more training were quicker on identifiers in the camel case style.

Training in one style, negatively impacts the find time for other styles.

The paper concludes:

Considering all four hypotheses together, it becomes evident that the camel case style leads to better all around performance once a subject is trained on this style. Training is required to quickly recognize such an identifier.

Discussion

Personally, I find the conclusion flawed for a couple of reasons.

Correctness isn’t of much importance when programming. Correctness refers to being able to correctly see the difference between similar identifiers. E.g. startTime vs startMime. This is not a common scenario when programming. Additionally, with modern IDE’s you have auto completion and indications when a written identifier doesn’t exist. This makes me believe results (1) and (3) are irrelevant. As a sidenote, I believe the correctness of camel casing is due to the slowness of the reading. When you need to take more time to read something, you will read it more accurately.

When discussing possible threats to validity they mention the following. “Essentially all training was with camel casing, it would be interesting to replicate the study with subjects trained using underscores.” Result (4) and (5) just seem unfair when taking this into account. Isn’t it obvious that people who are used to camel case are better at it. Additionally, it has a proven negative impact on the “find time” for underscores.

So, only the slowness of reading camel case(2) remains. It takes 13.5% longer on average to read a camel case identifier than an underscore identifier. Multiply this for entire code blocks, and you have my semi-scientific opinion on the war between camel case and underscores!

For those brave enough to stick around until the end, what is your opinion now? Again, try choosing independently of convention, habit or the type of identifiers.P.s.: If you still believe camel casing to be more appropriate for programming, it would be interesting to leave a comment with argumentation. 😉 I could update “Round 2: the rebuttals” to include your comments to make the article more balanced.

Update: I’ve discussed a follow-up study in a new post. They reproduced the study and measured it takes 20% longer on average to read a camel case identifier, and additionally using eye tracking they identified camel case identifiers require a higher average duration of fixations.

Share this:

Like this:

LikeLoading...

Author: Steven Jeuris

I have a PhD in Human-Computer Interaction and am currently working both as a software engineer at iMotions and as a postdoc at the Technical University of Denmark (DTU). This blend of research and development is the type of work which motivates and excites me the most. Currently, I am working on a distributed platform which enables researchers to conduct biometric research 'in the wild' (outside of the lab environment).
I have almost 10 years of professional software development experience. Prior to academia, I worked for several years as a professional full-stack software developer at a game development company in Belgium: AIM Productions. I liked the work and colleagues at the company too much to give up entirely for further studies, so I decided to combine the two. In 2009 I started studying for my master in Game and Media Technology at the University of Utrecht in the Netherlands, from which I graduated in 2012.
View all posts by Steven Jeuris

130 thoughts on “CamelCase vs underscores: Scientific showdown”

Well, it is a study following the scientific method. Furthermore, it references existing research in natural languages from which can be inferred that underscores should be more readable. Their result of 13.5% slower reading corresponds to existing reseach which measured slower reading in the range of 10% – 20%.

Hi! Thanks for the input, but I do mention why I drop it in the discussion. “Essentially all training was with camel casing, it would be interesting to replicate the study with subjects trained using underscores.” Result (4) and (5) just seem unfair when taking this into account. Additionally, I found a recent follow up study“An Eye Tracking Study on camelCase and under_score Identifier Styles” which attempts to solve these issues, and underscore casing comes out on top. Once I have time to analyze it I will update the blog post.

I can’t decide what is better but I think research (2) has nothing to do with camelCase. “replacing spaces with Latin letters, Greek letters or digits had a negative impact on reading” means camelxcase or camel4case, and later “Removing spaces altogether slows down reading”. True as you can see with camelcase. But all this is not camelCase.

I’m not 100% sure, as I read the papers a long time ago, but I recall that the part you are referring to is only a summary of previous natural language research. The findings from (2) are not the findings of the scenario you described, but from the experimental setup of this paper, which does compare camelCase with under_score.

With CamelCase it is more simple for me to distinguish the number of identifiers easily. In Haskell, where a space between two identifiers means function application that is a killer example.
Consider:
a_function_with_a_very_long name_applied_to an argument_with_a_similarly_long_one
vs
a_function_with_a_very_long_name_applied_to an_argument_with_a_similarly_long_one
vs
a_function_with_a very_long name_applied_to_an_argument_with_a_similarly_long_one

None is too simple, but when reading carefully the CamelCase one, it can be discovered more easily which one is correct (perhaps due to the fact that identifiers for variable things in Haskell start with lowercase…)

To conclude showing you how different the 3 things are, the same 3 examples above, with one single letters:
f x y z
vs
g a
vs
h i j

That basically boils down to “Camel case makes paragraphs easier to read.” as mentioned in the ‘Opinions’ section. As I reply in ‘The Rebuttals’ section: “Use whitespaces and an IDE with color coding to easily see the difference between operators and identifiers.” Different semantic meanings of keywords can easily be visualized in modern IDEs. I present this case even more strongly in “The code formatting fallacy”: https://whathecode.wordpress.com/2011/11/13/the-code-formatting-fallacy/

That is actually mentioned as one of the ‘pro’ arguments for underscores: “Abbreviations could still be kept uppercase easily. E.g.: TCP_IP_connection vs tcpIpConnection” I really dislike mixing the two, but of course all of that is highly subjective. 🙂

An argument in favor of underscores that I haven’t seen in this discussion concerns how you SAY the variable name in ofr example a code inspection. If you have a variable one_two_three, one would say “one underscore two underscore three”. On the otherhand, how does one say oneTwoThree in a way that distinguishes it from onetwoThree.

I tend to pronounce — even in my own head — underscores with a slight pause, similar to spaces or hyphens. I tend to read CapCase with a bit more stress on the capital letter — it is internally “louder”. That does mean that my preference (and error rate) is affected by how tired I am, and by how “far” I have to keep it in mind. (Matching to a few lines up on the screen favors CapCase; retyping it in another program favors under_lines.) Of course, actually switching at such times would be folly, but it does mean that the artificial situation used in the tests for the papers above is even more of a problem.

This “CapCase is louder” convention does happen to work well with the convention of using a capital to indicate the type, and a lower case to indicate an instance.

Thank you for the link! I found the same study a while back, and mentioned it in a previous comment as well. Given the popularity of this post I should probably best incorporate it at some point, or write a follow up post. Unfortunately I’m quite busy finishing up my thesis at the moment. 🙂

A java programmer working with large java code base(library,JDK…), everything is in camel case & I think decision has already been made & pushing underscores makes a deviation that majority won;t like.
Although I liked the argument for underscore, democracy and context (for java programmers) triumphs over correctness 🙂

“Underscores aren’t that hard to type. Seriously, as a programmer it is your duty to learn blind typing with all ten fingers.”

I’ve got quite short little fingers (my other fingers aren’t exactly those of a pianist either), so underscores actually ARE significantly harder for me to type – they require me to move my entire right hand. The plus/equals key is even further away – but I don’t need to type nearly as many of those!

I don’t disagree, but this argument can be taken too far. The best programmers I’ve ever met all knew Lisp (and used it seriously at some point in their careers). Therefore, in my experience, better coders use dashes.

I use Dvorak, too, and I know a lot of programmers who do. With how much faster and more comfortably I can type, I kind of wonder why anyone doesn’t. I think it’s interesting that so many programmers avoid learning a keyboard layout, when it’s less effort than (really) learning a new programming language or text editor or web framework — things that many do once a year just to keep their minds in shape.

Because arguably the main bottleneck in coding is not writing the actual code, but rather forming the required abstractions. Regardless, I have a similar opinion when it comes to touch typing. “Not being able touch-type as a programmer is like a cook who takes ages to slice his food up.” http://programmers.stackexchange.com/a/95372/15464

I like the underscore style for its readability but there are a couple of reasons why I use camel case:

1- If you’re programming in C, some compilers limit the maximum function name to 32 characters. This is a concern especially in embedded systems. If you’re going to prepend your functions with the module they are associated with, you start running into this limit.
Ex: SerialPortIsTxBufEmpty vs serial_port_is_tx_buf_empty

2- Most coding standards have a limit on the maximum width of a line (usually 80). Using underscore makes your line length much longer and forces you to break your statements into multiple lines, which is less readable. Using camel case allows you to put a little bit more on a single line. This is most noticeable when using nested if statements (which may not be as used in languages other than C).

Although the arguments sound a bit dated they are relevant nonetheless. Thanks! I personally don’t use the 80 characters limit. With the screen resolutions lately you can easily fit 130 character and still have all your tool bars open.

Font makes a difference. Camel case is probably harder to read if the font is narrow. I wonder how the study controlled for this?

Some people mentioned that underscores aren’t really “hard” to type if you can touch-type. But surely “hard” means “slower” or “more prone to error” rather than “makes my fingers hurt.” If that were true, it would be an argument against underscores.

I usually prefer camel case. But if i work on other people code and they use underscores I can easily switch and have no problem adapting. The thing is I usually don’t use long variable and function names and prefer to shortcut obvious words and doing that just looks better with camel case. A dumb example would be a variable called account name. You can have it aName vs a_name or acntName vs acnt_name which just looks weird with underscore. But like all others it’s just a matter of taste/habit/ide/language/native language/etc and if you do it one way or the other you’re gonna be good at it and those reading/understanding times will drop.

It would be interesting to see the style of all the big/popular open source repos where many people commit and read code from and also grouped by programming language.

Are we really that concerned with the speed of typing? So I could type an entire program 13% faster using underscores, but unless everything is automated there will still be a variable amount of time for someone to approve the work or even get it into production. Is there that much productivity gained?

Some would say this combines the worst of both elements. I say it combines the best features. You’re literally “filling in the blanks,” and using long symbol name support for what it was intended for — human-readable code.﻿

As a long-time Perl programmer I fall firmly in the underscores_for_names camp, but also in the $sigils @make %everything &more *readable camp. The fact that in other languages (C, Java, Python, Ruby) you can’t immediately see that a variable is a $scalar, @array or %hash is really weird to me and I find it much harder to read!

Obviously. But that’s why I started out in bold with: “independently of (language) convention, habit or type of the identifiers”. You aren’t the first to point out the importance of conventions. Look at the highest up voted answer on the post I linked to in the introduction. As I stated, I was merely interested in scientific resources, as that is something that I haven’t found discussed before, hence the desire to contribute in this way to the discussion instead of yet another subjective post on the topic.

I think one reason that I like camelCase better is that the variables look like single entities. The under_score versions tend to break apart into separate words in my mind. This takes more to sort out. I realize that IDE coloring helps this somewhat, but looking at code in other areas such as in the voting example above I don’t have those cues to work with..

I think you forget that speed of reading isn’t everything. Research on text reading shows that sentence/line length, word length, and word “difficulty” (whatever that can be; depends strongly on context and reader) have a strong effect on comprehension. Furthermore, words are easier to distinguish, if they differ towards their beginning. This seems to favour camelCase. That reading takes longer for camelCase might mean that people are forced to read more carefully and therefore remember/distinguish better … IMHO, finding a “good” identifier is more important than camelCase versus under_score. There is quite some recent research on naming. Check it out.

Indeed, speed of reading isn’t everything, but your remarks apply to both casing styles and thus don’t need to be controlled for. However, I don’t understand your reasoning that words differing towards their beginning are easier to distinguish, thus favoring camelCase. Perhaps I am missing something? Why would startTime / startMime, be easier to distinguish than start_time / start_mime? One could even argue since using underscores it more resembles two separate words it is easier to distinguish. However, I also mentioned, albeit tongue in cheek, “As a sidenote, I believe the correctness of camel casing is due to the slowness of the reading. When you need to take more time to read something, you will read it more accurately.”, so I might agree with you there. ;p

PS. I forgot one thing regarding the voting on the web page.
The comparison is not fair. If “camelCase” is used as one alternative, then the other one should be “under_score” and not “underscore”. I’d consider this as a threat to validity, since it gives all identifiers of the under_score group fewer parts than the identifiers in the camelCase group.

“Underscores aren’t that hard to type. Seriously, as a programmer it is your duty to learn blind typing with all ten fingers. Learn qwerty”

Coders who use camel case *can* type. They know qwerty. However, if something is awkward, you will always be slower at it than you could be at a non-awkward alternative.

Also, I can apply the same rebuttal to underscore’s reading speed benefit: “Camel case isn’t that hard to read. Seriously, as a programmer it is your duty to learn to read all kinds of code. Learn to read code!”

Actually underscores are (or can be) easier to type. You only have that one thing you hit the shift key for, so you always shift with your left pinkie and if you’re hitting shift, your right hand knows where that finger is going, to the only thing you shift for in a variable name.

It’s true that with Perl 5’s sigils you have to type shifted characters at the beginning of the variable name; what I was referring to was the actual name itself. In camel case you’re having to shift at any letter starting a new word, which would be “shift+whatever-letter-of-the-alphabet-is-needed”, which is a lot more “stuff” than being able to hit one frequently used shifted character. Also, you’re going to have to type those sigils anyway, and they share this characteristic with the underscore, in that they are a limited set of very frequently used characters.

What this argument does not take into account is not only the remote location on the keyboard, and the fact that the pinky is the shortest finger, but also, it is actually three buttons versus two.. (“shift-underscore and letter” versus just “shift-letter”)

@Surest I believe you misunderstood @michaeljsouth’s valid (objective, no subjectivity here) comments in regarding the use of the shift key under both conditions. It is always the same motion (thus rapidly ingrained in muscle memory) using underscore names. For example, the pinky needs to be used regardless for ‘p’ as well. (Think, validPoint). But of course, people are already used to writing capital letters as part of the start of each sentence. 🙂

I think that a more fair comparison would be had if most languages were not case-sensitive. In that case, I would consider camelCase as superior to underscores. (since there is no case error situation possible) Otherwise, I like underscores more.

In languages that aren’t case sensitive, camelCase is dangerous because of the possibility of collisions (i.e. between thisOldMan and thIsOldMan). Underscores are then needed to work around that flaw in the language.

This is a reason not to use camel case for PHP class method names in particular, which is what I was thinking about when I came here…

But most languages are case sensitive, primarily for the reason that uppercase and lowercase letters are encoded differently. Unix is a large factor in this and until Visual Basic nobody actually bothered with case insensitivity because many programmers find case insensitivity to be more restrictive than case sensitivity.

As someone with an A level in psychology, I can tell you that “shaded boxes (similar to underscores)” doesn’t cut it. That is what is known as a ‘generalisation’ and making generalisations gets you laughed out of the debate room frankly.

As a programmer I could tell you that if there was a shaded box button, I would prefer it to the underscore button (providing it wasn’t as far away as the escape key or the number pad).

I accept your argument that underscores aren’t too hard to type, in fact the underscore shares its button with a common operator and is no further away than the += button.
Personally I would argue that because it is a symbol it looks too much like an operator to be acceptable in identifiers. By extension I would also argue that the dollar sign or pound sign should not be used unless the language in question uses it for a specific reason (such as perl’s use of $ to identify scalar variables).

I do not accept “Use whitespaces and an IDE with color coding to easily see the difference between operators and identifiers” as an argument. That is a work around, not an argument. Particularly the “Use whitespaces” part, as many compilers purposely allow programmers to be more lax with whitespace to prevent annoying ‘incorrect syntax’ errors that slow productivity and could otherwise be avoided.

That is no valid point. The letter should be capitalized where a new word begins, just the same as where you would put an underscore. I can’t see any difference here, just feels like seeking for arguments and not able to find some more.

That’s a good point, … Quite worrisome that after so many visits you are the first to notice. 🙂 I don’t remember where I got that one from. I think I was merely summarizing stuff I read around the internet, but this one indeed has little foundation. It is listed in the section ‘opinions’ for a reason. ;p I’ll look into where I got it from when I have time to see whether there is any ‘stronger’ reasoning behind it, and will otherwise remove it since it doesn’t make any sense indeed, as you point out.

If you have a variable by the name of “user_name”, for example, and you have a get- and set-method: “set_user_name” / “get_user_name” it becomes easier to rename your variable later on.
You just search for all instances of “user_name” and replace them with, for example, “player_name”.

But if your variable is called “userName” and your methods are “getUserName” and “setUserName” you have to replace “userName” with “playerName” and “UserName” with “PlayerName”.

Research aside, I like CamelCase better… my reason is less clutter. I think your eyes need ‘something’ to delineate the characters into words, and since age 2 I have seen every sentence, name, place, or thing start with an uppercase letter. Uppercase means you are starting something. How else could your brain EVER see that. (Lol catch that?) However, with underscores, I have something that is suppose to represent nothing. Pay no attention to the man behind the curtain! Think about it, we see capital letters starting words everywhere in our world, but no where are underscores used, just code.

Intersting observation but I think it is a better argument for underscores than for CamelCase. Most users of camel case actually start with a lower case letter as in “thisIsCamelCase”. Users of underscores generally start with a cap as in “This_Is_Under_Scores”. Also, while as you say, things start with an uppercase letter, they actually start with an uppercase letter preceded by a blank, which is a much better description of typical underscore usage than CamelCase.

One last thought… it would be an interesting experiment to ask everyday people to sit at a computer and type a paragraph. The twist is you tell them the space bar is broken, so please use something else. Would we get a bunch of underscores? Dashes? Lol just thinking about trying to type with a broken space bar makes me uber-frustrated-you-know-what-i-mean? Ahh!

ok i calmed down
i figured out that the solution is neither
underscores are better because then i can have the solution which is…

a button on a ide or preference settings switch
that just lets me switch everything to what ever style i feel like using that day when i load in code

you know though no one does reverse capitalization mYOBJECT lol
or if we had another alt type key that added strong soft sounds like the old curve over a letter
or if we had another alt type key that put a square around a letter
or if we had another alt type key that underlined a whole word or continuous character sequence

im changing all the colors in vs for everything to a unique color right now

Programming is more like logic or mathematics than English. If you turned in a mathematical derivation that consisted entirely of full English words, your professor would probably not grade it very highly, either!

I’m having a deep internal debate about all that since a few months. My whole Python codebase is camelCase even though it’s against Pep8 recommendations (Pep8 has some strange mixes like CamelCase classes with the_class_method methods I don’t really like and my employer has a full camelCase codebase ).

Anyway I came across some cases where underscore is superior to a strict camelCase convention:

Let’s say that you have a definition that converts some data to sRGB color space, you would write it this way in lowerCamelCase:

def convertDataToSrgb:
pass

If you want to respect the naming convention you have to change the case of sRGB to Srgb.

With an underscore naming convention one could keep convert_data_to_sRGB.

I have more annoying cases over my blog and would be happy to hear what you think about it.

Basically naming a definition that converts from CIE XYZ to CIE xyY (Notice the case importance in the different colorspaces names).

As a scientific programmer, I find that camelCase — or a mixutre, I suppose — suits my needs best. It’s important to expicitly define the units of many of the variables I work with, and I find that using camel case for the variable name with the unit tied on via underscore makes for an easy way to read the variable name (what the variable is) and the unit of measurement:
centerFrequency_hz
symbolPeriod_usecs
incidentAngle_rad

personally i prefere camelcase because the _ even on an qwerty to be really far away from the other letters. the alt gr is a god key if the underscore was on altrgr+f than it would be useful. for those why do not know where the alt gr is at it is just the right alt key and can be pressed with the thumb.

I personally find code with the underscores more clumsy/cluttered when you see a full code block specially if pre underscores are used to set the constants.

“Underscores aren’t that hard to type. Seriously, as a programmer it is your duty to learn blind typing with all ten fingers.”

That has nothing at all to do with relative typing ease. This kind of thing comes up in vim/emacs discussions – how far do you have to stretch your hand, and with which finger? The shift key has to be pressed for both camelCase and underscore. With camelCase, you often hit the letter and shit with the same hand, and letters are always easier to reach, and naturally faster – everyone types letters much more often, and so you aren’t interrupting the flow of typing. For underscore, you need to stretch up to hit the underscore key, while with the opposite hand hitting shift. Interrupts typing flow.

“You should learn to type” is no excuse for typing in a relatively less efficient manner 😉

In the conversations what was left out is the probability that the underscore is often misread or misinterpreted as a space causing coding problems and headaches. This can happen in I/O systems such as input forms, Web pages, Print outs, or outside scripting/programming languages. It is no longer safe to say that we will be sticking within only one coding language or syntax method. Many people with limited nearsightedness may also misread this are will be unable to distinguish the underscore and see it as a space depending on sight, display or within a form with a frame where it gets hidden. Everyone gets older with there eyesight suffering and at the same time displays are getting smaller instead of larger these days.

The research paper you cite mentions demographics briefly, but says nothing about the spoken language background of the participants.

One of the best arguments against camelCase that I’ve heard is that it’s much harder for non-native speakers to read. I’m a native English speaker but when I’ve had to read source code written in German camelCase, even though I speak conversational German, I certainly have a bear of a time figuring it out.

This is not just a theoretical problem. In my current professional environment, 50% of the team speak English as a second or third language. That’s actually high: globally, English is the language of source code, but far fewer than 50% of programmers speak English as a first language.

“A wide variety of students took part in the study. Based on the demographics data, they represent a fair cross section of Loyola students”, which seems to be “Loyola University of Chicago”. I would expect a majority of English speaking students, including possible exchange students who would still be proficient enough in English.

Maybe it’s because I’ve been programming in languages that primarily use camelCase for so long, but I have a lot of difficulty reading underscore identifiers. Part of it may be that I’ve trained myself to use non-alphanumeric characters to tokenize what I’m reading, and the underscore conflicts with that predisposition. When reading python code I find I have to re-read things once or twice to make sure I read it correctly.

Also, as someone who codes a lot of Scala, the “_” is used as a placeholder for lambda functions, and Scala also allows for infix notation of code, so “foo _ bar” means something totally different from “foo_bar”: the former is a lambda invocation of “foo” with an anonymous parameter followed by an invocation of bar, whereas the latter is simply an identifier.

A few things… Your first poll asks which “LooksAppealing” (or “looks_appealing”). That’s a different question than what you focused on in the article, and a different question than you ask at the end.

Also, In Round 1, allowing the “Pro Underscore” group to take the “Camel Case is OK in x situation” is unfair, unless you let the Camel people do the same, at which point everyone is just agreeing on a hybrid and maybe disagreeing over where the hybrid lines are drawn. Cheating… deduct 5 points from Gryff^H^H^H^H^HUnderscore.

In Round 2, the rebuttal about using text editors with syntax highlighting applies to both camps. Again you’re giving points to the Underscore camp that can be used in both places. You’re showing bias here.

Round 3 clearly shows that the only third party research you link to sides with the CamelCase, then you dismiss their conclusions. So… again I cry bias. Also, this is an unbalanced dismissal… you argue that reading speed is more important than correctness, and I think that’s very subjective, particularly with code. Yes it’s possible to agree that a good naming convention would limit the apparent naming collisions, so limit the importance of “correctness”, so you may have a point, but you didn’t dive as deeply into the reading-speed issue. It’s possible that reading speed isn’t that important once you’re inside structured code. The experiment you linked to says “On the […] screen four clouds move around the screen. Each cloud contains an identifier written in one of the two styles under investigation.”

This experiment is nothing like real programming. So I argue that your dismissal of the importance of correctness can equally be applied to a dismissal of the importance of speed; both are flawed proxies for overall utility.

Those questions and associated polls were just intended to frame (and initiate) the discussion. The meat of the article really lies in the presentation and the discussion of the paper, since I’ve never seen the ‘objective’ angle on this discussion before. Round one and two are just a summary of different opinions I encountered myself, and are only intended to be ‘fair’ in the sense of providing an overview of the different arguments I found online. I make no claims about them being fair or valid arguments.

If you find any other relevant studies, be more than welcome to post them here as I prefer steering the discussion on this post away from the subjective and towards objective results on the topic. Overall, I agree, there are some serious flaws with the methodology of this study, but the one strong result it did find (and was replicated in the follow-up study), does seem to advocate pro-underscore in this laboratory setup.

If you search for “comic sans promotes better reading comprehension ?” you’ll fond study that show that slowing reading speed might be a good idea 🙂 So I think that the correctness part of thé mentionned study might be on an interesting track

Your isIllicitIgloo example is flawed because you use a font that no one in their right mind would use in a programming textpad or IDE. Try opening it in notepad++ and it is very clear exactly what it says. On the converse, convert you font to wingdings and read any of your underscored variables and they are equally unreadable.

Camel casing reduces the horizontal width of your variable names, which means less horizontal scanning is needed. If the width of a variable name affects how long a programmer decides to make it, then variable names may be shortened by omitting words and therefore convey less meaning. This would be compounded in an expression that uses multiple such variables.

Above you mention that correctness when reading variable names is not a significant factor in programming. I’d argue that while lexical correctness when parsing the text is not a big factor, LOGICAL correctness of being able to quickly grok a variable name’s origin and purpose IS a big factor.

So if camel casing leads to variable names which are easier to understand because they contain more words (and therefore more meaning) in the same screen space, then even if they take longer to parse, they help the reader understand the code more quickly, which I would argue makes up the lion’s share of the time spent reading code.

That said, I have no evidence of any of this, it’s just my intuition, and you asked for counterarguments. Thanks for the great article! 🙂

Some anti-underscore arguments that are deal-breakers for me:
1. Underscores become invisible when the text is underlined.
2. Underscores are always abused by using multiple__consecutive___underscores, _leading_underscores, _leading_underscores, identifiers consisting only of underscores, etc.

I use camel case primarily, although I do use underscores occasionally, often as a delimiter in HTML. Repetitive form fields share a root ID, but then have some unique identifier tacked on to the end that I can easily parse out using the underscore.Could certainly still be done without one, but it does make it more readable in the rendered code.

I believe the relevant field of study is actually in typography, eye-tracking studies of reading english (or similar phonetic languages), graphics design, etc. There are many issues which are neglected in your post (and in the experiment):

# Font and Kerning
Choice and design of font and other aspects of typography can degrade performance for both conventions:
1. when_using_underscores_frequently, the stroke should be thicker, and the length shorter; underscore is not designed as a word delimiter which is why its length is significantly longer than that of a space
2. WhenUsingCamelCaseFrequently, upper-case alphabets should have a higher height (or lower-case have a lower height) for a stronger distinction; the kerning can be increased when an upper is preceded by a lower; Fonts need to be lighter in general to compensate for the increased ‘ink’ density

Even when using fixed-width fonts (which are somehow popular with programmers), many aspects (but not underscore length) of typography still applies.

The typography choices will strongly influence the effects of either convention.

# Line spacing
Bigger line spacing => advantage for underscore:
Smaller line spacing => advantage for CamelCase: When line spacing is small, it is harder to distinguish the boundary between words. Clumping alphabets and glyphs together will improve word perception (both speed and accuracy).

# Using capitals with underscore
There is CamelCase, and there is Camel_case, and Camel_Case. Underscores allows greater expressiveness, and clearer contours, if they are used with upper-cases. Word contours have been identified as one of the more significant features that human visual neural circuits use to recognise words, especially words at the periphery of vision (generally things more than 15 characters away horizontally). However, I often_see_underscores_used_like_this. But_not_like_this_which_should_be_superior.

# Font size, avg word length
As font size increases, words become easier to perceive. However, at some point, the number of words that can fit into the field of vision decreases below the processing speed of the rest of the brain (really depends on complexity and verbosity of code). CamelCase is clearly more compact; But short-forms should be even better when there are only a few variable names (doesn’t quite work when there are like 30+ names to express, or when the language doesn’t have good aliases, like type/class-name aliases).

# Screen Size
Smaller screens amplifies the effects of most factors

# Scanning vs Reading
When the section of code is more frequently scanned than read in detail (i.e. when there is a lot of boiler plate, little logic but a list of property/config settings, depends on language), then all factors for quick word perception (contour, word length, grayscale weight, kerning) becomes more important; when reading in detail word recognition is seldom the limiting factor.

# etc
Quite a few other factors, see the papers and textbooks. Probably dependent on IDE and font-availability as well.

All of these mentioned factors were controlled for. Unless you want to make the argument that one of your listed factors contributed more to the measured result, than the difference between the manipulated variable, I do not see how this should invalidate the study. They used common fonts, used in actual editors. (p.s. monospaced fonts are preferred so that indentation of brackets are aligned)

Personally, I feel the threats to validity listed in my discussion will likely impact the study more than your listed factors. For example, you mention line spacing, but there were no multiple lines used in the study. I do agree this implies a possible lack of ecological validity, but as always, you need to interpret study results in the context of the experiment which was conducted.

Yes, I should have been more explicit: I do think that there is a lack of ecological validity, as there is a non-negligible interaction effect (I only elaborated the interaction effects) between the factors I listed and camelCase vs under_score. Based on the factors and their interaction effects I described, I think that in a different ecology, the effects of camelCase vs under_Score could and should reverse.

We need to interpret the study results in the context of the experiment which was conducted, but at the same time, if we are to apply them, we also need to interpret them them in the context of our actual work. What I said does not invalidate the study – I am highlighting its very narrow applicability.

Or perhaps my ecology is somewhat different: I code on a 42″ screen, almost entirely using distraction-free mode in Sublime Text and Vi, and turn the font-size to between 20px to 34px. I use a wide variety of programming and natural languages, and notice differences depending on the language (and whether I can open it in Sublime/Vi vs need to squint on some proprietary editor/website-that-can’t-be-zoomed).

p.s. I’ve been using proportional fonts for coding Javascript in Sublime Text since 2 years ago, and met with no issues except being unable to tell when I’ve hit 100 characters on a line; have not switched back. Many languages + formatting-styles only benefit from indent alignment. Especially the ones that are enforceable by code-formatters and linters.

Thank you for raising the topic and support it with actual studies.
Being a programmer myself, I believe that the reason the 2 conventions exist in the first place is to be able to make difference about what they represent. Indeed, when programming, I follow this simple rule : camelCase for a set of values/attributes, underscore_case for an actual value. This makes my code easier to read because this way you always know what you’re talking about. It increases your efficiency for sometimes you may want to apply a value to a given set (avoids confusion). Those using AngularJS will know what I’m talking about for this rule allows you to significantly reduce your code length (using directives for example).

I guess that elegance won’t matter to you as long as you’re a gray, boring person. I just simply cannot stand underscores, they bring me down, everytime I type a nice camelcased name I find it beautiful and it makes my coding more enjoyable.

In snake_case you have to something like:
message_generator message_generator_;
or
message_generator msg_gen;

while the first one is just a pure annoyance, the latter one can lead to names which are harder to read (the above is not an ideal example for this).

2. When you compile in c/c++, usually errors in output are printed underlined. But with an underlined snake_case identifier, it’s really hard to see if there is an underscore at the end or not.

3. Another big Pro for CamelCase is IDE support:
For example in Netbeans opening a CamelCaseFile using Keyboard only is really easy and fast compared to grabbing your mouse:
Open the “Go to file” dialog (alt+shift+o), type in “CCF” (all the capitals) and the IDE will find all files which have these capital letters, which usually are just a few.

I guess you could use CamelCase for classes, while methods and attributes/variables stay snake_case, but I don’t really like mixing styles.

This is probably because (as the study noted) being trained in one style negatively influences the others, but I do not find underscores easier or quicker to read, especially (as you noted) in blocks of code. The underscore is a weighty separator, so new_truck_color looks like three different things to me, while newTruckColor is clearly one thing.

Or maybe because I spent a lot of time coding in PHP, where they love underscores. Now I’d rather avoid them.

I don’t think it is valid to say “Correctness isn’t of much importance when programming.” I believe the opposite.

But the study mentions something important: “The variable Training has no
statistically significant impact on how Style affects Correctness”
Combined with: “essentially all training was with camel caseing”

This essentially means:
Training with camel caseing does not improve the correctness towards camel case, just the overall correctness.

So either something is fishy. Maybe the term “training” is only used for advanced experience, and a lot more of the students had basic experience with camel case.

When you say you believe the ‘opposite’, do you mean that correctness is more important than reading speed? I.e., being able to correctly see the difference between similar identifiers (e.g., “startTime” vs. “startMime”) is more important than “start_time” being quicker to read than “startTime”?

As one man said “without data you’re just another person with an opinion”. I also think that this subject is so unimportant that I absolutely have no guilt or shame using names like `AkStreamType_VIDEO_H264_BYTESTREAM` where appropriate.

Every language has its own conventions. For example in Ruby using underscores for variables suits the logic of the language while in JavaScript the convention is rather to use camelCase. It also depends on the language being case sensitive or insensitive.

Mass moving to camelCase was a silly step, because:
1) These styles are more or less equal and have no noticeable advantages or disadvantages
2) Lot’s of old libraries and languages are using underscores
3) No we have lot’s of code mixed in different naming styles – and this is a REAL DISADVANTAGE

I am not a die hard camelCase person, but I think underscores are harder to read. My eyes are constantly drawn downward, away from the central line of the text. In filename contexts, for example, this is particularly hard, and I go for hyphens if possible.

I don’t think we should diminish the importance of ease of typing. The extra keystrokes definitely add up. Using underscores, even with a modern IDE, you end up typing what is essentially a filler character many times. There’s a reason why the space bar is by your thumbs and very long.

I’m pretty OK with CamelCase*, but I have a strong objection to lowerCamelCase, for a reason not mentioned above:
Prepending a word to a name “modifies” the first letter’s case:
sortedWords = sort(words)
This freaks me out, as after many years with case-sensitive languages, I feel very strongly that case is an “immutable” part of a name 🙂
It’d be OK to `sortedWords = sort(Words)` or `sorted_words = sort(words)`.

I'd much prefer `if (max_width < width) max_width = width;` where I derive related names by prefixing with `max_` rather than the weird "prefix with max”.

* I’m used to CamelCase for classes, though I don’t enjoy it when used for all functions/methods, which is rare practice, but encouraged in Google’s style guides.
That practice probably influenced the design of Go language, where first letter case is what determines if it’s public/private!
PublicFunction, PublicClass, PublicMethod, PublicGlobal / privateFunction, privateClass, privateMethod, privateGlobal.
This is weird but does encode important info visually, which I have to concede is a benefit, though I still don’t enjoy it astethically…

—-

I’m curious how sensitive reading speed/correctness results are to fonts.
For example, it’s unfortunate that in proportional fonts “_” underscore is much wider than ” ” space. For code, I actually want underscores to “bind stronger” than a space. (I’ve once experimented with Emacs glasses-mode styling the inserted underscores to use a much smaller font and/or to use tiny spaces, but I’m generally editing in monospace and half-spaces looked weird…)
I wonder if any monospace fonts for programmers could improve CamelCase legibility by “kerning” the letters a bit to add a tiny bit of space between the words. Opened https://github.com/larsenwork/monoid/issues/232 (the designer of Monoid discovered a way to simulate kerning but stay monospace, so might be interested).

I’m surprised that neither of the two studies mentions fonts, as if it doesn’t matter!?
From screenshots, the 1st (Binkley at all) seems to use Arial or something similar, the 2nd (Sharif, Maleitic) used Courier or something similar.