This has become a large frustration with the codebase I'm currently working in; many of our variable names are short and undescriptive. I'm the only developer left on the project, and there isn't documentation as to what most of them do, so I have to spend extra time tracking down what they represent.

For example, I was reading over some code that updates the definition of an optical surface. The variables set at the start were as follows:

Maybe it's just me, but it told me essentially nothing about what they represented, which made understanding the code further down difficult. All I knew was that it was a variable parsed out from a specific row from a specific table, somewhere. After some searching, I found out what they meant:

I renamed them to essentially what I have up there. It lengthens some lines, but I feel like that's a fair trade off. This kind of naming scheme is used throughout a lot of the code however. I'm not sure if it's an artifact from developers who learned by working with older systems, or if there's a deeper reason behind it. Is there a good reason to name variables this way, or am I justified in updating them to more descriptive names as I come across them?

You're working with physicists

It appears that these variable names are based on the abbreviations you'd expect to find in a physics textbook working various optics problems. This is one of the situations where short variable names are often preferable to longer variable names. If you have physicists (or people that are accustomed to working the equations out by hand) that are accustomed to using common abbreviations like Rin, Rout, etc. the code will be much clearer with those abbreviations than it would be with longer variable names. It also makes it much easier to compare formulas from papers and textbooks with code to make sure that the code is actually doing the computations properly.

Anyone that is familiar with optics will immediately recognize something like Rin as the inner radius (in a physics paper, the in would be rendered as a subscript), Rout as the outer radius, etc. Although they would almost certainly be able to mentally translate something like innerRadius to the more familiar nomenclature, doing so would make the code less clear to that person. It would make it more difficult to spot cases where a familiar formula had been coded incorrectly and it would make it more difficult to translate equations in code to and from the equations they would find in a paper or a textbook.

If you are the only person that ever looks at this code, you never need to translate between the code and a standard optics equation, and it is unlikely that a physicist is ever going to need to look at the code in the future perhaps it does make sense to refactor because the benefit of the abbreviations no longer outweighs the cost. If this was a new development, however, it would almost certainly make sense to use the same abbreviations in the code that you would find in the literature.

How long will that variable live?

Variables with short lifetimes should be named shortly. As an example, you don't writefor(int arrayCounter = 0; arrayCounter < 10; arrayCounter++) { …. Instead, you use for(int i ….

In general, it could be said that the shorter the variable scope the shorter the name should be. Loop counters are often only single letters, say i, j and k. Local variables are something like base or from and to. Global variables are then somewhat more elaborate, for example EntityTablePointer.

Perhaps a rule like this isn't being followed with the codebase you work with. It's a good reason to do some refactoring though!

Where are the comments!?

The problem with the code is not the short names, but rather the lack of a comment which would explain the abbreviations, or point to some helpful materials about the formulas from which the variables are derived.

That is fine, since problem domain familiarity is probably required to understand and maintain the code, especially in the role of someone who "owns" it, so it behooves you to acquire the familiarity rather than to go around lengthening names.

But it would be nice if the code provided some hints to serve as springboards. Even a domain expert could forget that dK is a conic constant. Adding a little "cheat sheet" in a comment block wouldn't hurt.

Find the original post here. See more Q&A like this at Programmers, a site for conceptual programming questions at Stack Exchange. And of course, feel free to ask your own.

They can be a problem, but in this example the only thing that stands out is the wretched use of Hungarian notation.

You shouldn't be maintaining this code if you're not familiar with the underlying physics, and if you're familiar with the underlying physics you'll be familiar with the relevant terminology, and that includes conventions such as using 'r' to represent the radius, and so on.

I agree with the answers in the article. There are 2 situations in which to use short names:

1. when programming in a problem domain where short names are idiomatic, as in this article (usually math):

Code:

integrate f min max dx = (dx *) . sum . map f $ [min, min + dx..max]

This code is perfectly understandable because in the context of integration, "f" and "dx" have standard meanings.

2. The variable has very short scope:

Code:

foldr f x [] = xfoldr f x (y:ys) = y `f` (foldr f x ys)

Here an explanation of the variables isn't even necessary because the scope is so small that the usage immediately implies the meaning. The most enlightening thing about such short code is the structure of the code, from which one can infer that we're repeatedly unrolling the third argument and applying "f" recursively, not the particular variables used. Thus IMO using single letter variables actually improves clarity.

The first answer makes a fair point, but regardless, I'm not sold on the Hungarian notation of prefixing the variables with a superfluous "d" for double.

There's actually two types of Hungarian Notation... there's "Systems" and "Apps". "Systems" seems to be the type that everyone uses and hate, which is where the prefix represents the data type. "Apps", on the other hand, you're supposed to encode more so meta information, such as in an app that takes data from potentially-hostile sources, you could use the prefix 'us' to represent 'unsafe string' and 'ss' for 'sanitized string'. IMO, "Apps" notation would actually be very useful in some circumstances, while "Systems" notion seems like it'd only be useful if you were using Notepad or some other editor where you can't quickly & trivially discover a variable's type.

Though, personally, I don't use Hungarian Notation, as I'm in favor of longer variable names (especially when using an editor with completion). It helps make things more explicit (prepending "unsafe" is a lot more obvious than "u"), and makes it a bit easier to come back to code later without having to read the (likely never written) documentation. Of course, I'm also a fan of Objective-C which is big on obscenely long names (which I've grown to love).

To clarify, when I use a short name for a variable with a small scope the rule of thumb I use is that you should be able to see the declaration and all uses of the variable without scrolling so no one has to say "what does 'f' mean?".

Oh and Hungarian notation can die in a fire. I hate looking in the IDE and seeing a list of variables (in intellisense for example) that all start the same (nUserId, nReceverId, nMsgCount, nWhatever) it makes it harder to differentiate. In my example typing 'M' would get me MsgCount right away as well as any other Messsage related variables. However for UI code I do like to append dataType information, especially for variables bound to UI elements userIDLabel, messageTextbox etc.

The problem is rarely how long or short names are or how well commented things are. The problem is almost invariably structure.

If your function is very long and the logic is complex to where you have to repeat a variable many times, the real problem is that the function doesn't express a clear solution to a problem. It needs to be broken out into functions that are clear and concise. Then the variables can be reasonably descriptive (as you're not typing them many times over) and you only need a brief description of what it does.

Similarly, no amount of comments or long variable names can explain convoluted logic. If anything, they'll just become a maintenance issue.

And Hungarian notation: if your variable is used so far from its declaration that you feel the need to document that declaration in every use, again, you have a structure problem.

Hungarian Notation is used because the common datatypes are not people friendly. For example, go out onto the streets and ask, "How do you multiple by ten?" Most of the time, the answer you'll get is, "Add a zero on the end." This is not arithmetic, this is string manipulation. People automatically switch from numbers to strings and back again. And yet, many languages not only insist that strings and numbers are different, but there are different types of numbers. Create a language that's closer to how people think and you reduce many of the problems with variables.

No, there really isn't. There's only one type of Hungarian notation, and that is "superfluous encoding of type information into variable names, where it's never type-checked and routinely out-of-date".

Quote:

"Systems" seems to be the type that everyone uses and hate, which is where the prefix represents the data type. "Apps", on the other hand, you're supposed to encode more so meta information, such as in an app that takes data from potentially-hostile sources, you could use the prefix 'us' to represent 'unsafe string' and 'ss' for 'sanitized string'.

But these are the same thing. The type of unsafe strings should be different from the type of safe strings, so that you can never use an unsafe string in an unsafe way.

Hungarian Notation is used because the common datatypes are not people friendly. For example, go out onto the streets and ask, "How do you multiple by ten?" Most of the time, the answer you'll get is, "Add a zero on the end." This is not arithmetic, this is string manipulation. People automatically switch from numbers to strings and back again. And yet, many languages not only insist that strings and numbers are different, but there are different types of numbers. Create a language that's closer to how people think and you reduce many of the problems with variables.

In your example, the person on the street is simply wrong. Confusing/conflating different types of data doesn't make things simpler; it makes things far more complex. Consider PHP's and JS's notions of equality.

I tend to follow Microsoft's conventions on variable names since all my current projects are .NET ones, but I think they are pretty good guidelines in general.

Do not use Hungarian NotationDo favor readability over brevityDo choose easily readable identifier namesDo not use non-alphanumeric characters

I do agree with the second answer about short loops using something like "i" instead of "arrayCounter", that's something that I do as well. In general, there's not a very good reason to avoid using descriptive variable names.

There are obviously different opinions about the usefulness of things like short variable names within a very narrow scope or Hungarian Notation, so it's a good idea to flush all that out in your coding standards documentation, imo.

Hungarian Notation is used because the common datatypes are not people friendly.

No it's not, it was designed for systems languages like C by and for professional software engineers.

Quote:

And yet, many languages not only insist that strings and numbers are different, but there are different types of numbers. Create a language that's closer to how people think and you reduce many of the problems with variables.

If you have a straightforward type system, you have to learn a few counter-intuitive rules about how types work.

When you have implicit casting a la Javascript, you have to learn all the gotchas, then all the workarounds to get the actual types you need, and then you still need to learn the counter-intuitive rules about how types work.

When I see a bunch of variables starting with a lower-case d in code that was obviously transcribed from a physics textbook, I assume it means "derivative of." So in this case, Hungarian notation is not just ugly; it's misleading.

When I see a bunch of variables starting with a lower-case d in code that was obviously transcribed from a physics textbook, I assume it means "derivative of." So in this case, Hungarian notation is not just ugly; it's misleading.

Yes, my initial read of the code was the same.

But author, if you think this code is hard to read, you should try some of the implicit typed (and variables most all one letter, i, x, y, etc) goto-laden f77 code we physicists often have to work with every day. I could show you scores of routines that have more than 50 goto statements in them. What's perhaps most stunning is that these code bits work robustly and have done so for 25+ years (but woe be unto you if you even contemplate significant changes to such code).

I often found myself in a similar situation described in the problem (and as explained in the first answer). In my last job I often had to transcribe scientific formulas from papers into code.

While I usually stick to a longer more descriptive naming convention, I didn't in the case of scientific formulas.Because, if you do a calculation described in a scientific paper or textbook it is usually easier to stay close to the original nomenclature. Else, it's very hard to find errors or deviations from the source.

But, what the original coder should have done, and which I usually did in such a case, is describe the formula, the variables and cite the source in the comments.

But author, if you think this code is hard to read, you should try some of the implicit typed (and variables most all one letter, i, x, y, etc) goto-laden f77 code we physicists often have to work with every day. I could show you scores of routines that have more than 50 goto statements in them. What's perhaps most stunning is that these code bits work robustly and have done so for 25+ years.

That's the great about programs - they will always do exactly what you tell them to without fail. The issue with things like this, and one I'm sure you've had to deal with, is that it becomes an extendability and maintainability nightmare later on down the road.

Hungarian Notation is used because the common datatypes are not people friendly. For example, go out onto the streets and ask, "How do you multiple by ten?" Most of the time, the answer you'll get is, "Add a zero on the end." This is not arithmetic, this is string manipulation. People automatically switch from numbers to strings and back again. And yet, many languages not only insist that strings and numbers are different, but there are different types of numbers. Create a language that's closer to how people think and you reduce many of the problems with variables.

In your example, the person on the street is simply wrong. Confusing/conflating different types of data doesn't make things simpler; it makes things far more complex. Consider PHP's and JS's notions of equality.

You both are confusing how you been taught to think in computerese and how the average person thinks. That's what makes programming hard; you have to think in an unnatural way.

If you want to compare two strings, then you do so byte by byte. If you want to compare two words or two phrases, then it's not trivial. Look at an search engine. Comparing text is a difficult task. And you're usually better off using irregular-expression matching.

When I see a bunch of variables starting with a lower-case d in code that was obviously transcribed from a physics textbook, I assume it means "derivative of." So in this case, Hungarian notation is not just ugly; it's misleading.

That's what I thought at first. But I got my doubts when I saw all of them started with "d". It's got to be some weird formula to only work with derivatives (although it might be possible).

You both are confusing how you been taught to think in computerese and how the average person thinks. That's what makes programming hard; you have to think in an unnatural way.

The average person doesn't think precisely enough to program, period. (I'm not saying that the average person couldn't learn, but you don't need to be that precise in everyday life.) So if you make a computer work the way an average person does, all of a sudden the computer is doing things in a not completely defined way, second-guessing and re-interpreting your logic. Programming anything nontrivial in such an environment would be hell. Bridging the gap between the ways computers and humans think shouldn't be done by making computers more seemingly intuitive while behind the scenes causing all sorts of issues; it's to make programmers think more precisely.

Hungarian Notation is used because the common datatypes are not people friendly. For example, go out onto the streets and ask, "How do you multiple by ten?" Most of the time, the answer you'll get is, "Add a zero on the end." This is not arithmetic, this is string manipulation. People automatically switch from numbers to strings and back again. And yet, many languages not only insist that strings and numbers are different, but there are different types of numbers. Create a language that's closer to how people think and you reduce many of the problems with variables.

In your example, the person on the street is simply wrong. Confusing/conflating different types of data doesn't make things simpler; it makes things far more complex. Consider PHP's and JS's notions of equality.

You both are confusing how you been taught to think in computerese and how the average person thinks. That's what makes programming hard; you have to think in an unnatural way.

If you want to compare two strings, then you do so byte by byte. If you want to compare two words or two phrases, then it's not trivial. Look at an search engine. Comparing text is a difficult task. And you're usually better off using irregular-expression matching.

What makes you think that thinking in a natural way is better ? Most of the hard stuff I have learned in University involved me overcoming my "natural way of thinking" , and learning how to think logically. I know for me specifically I had to learn how to think logically, I wasn't born that way. Maybe some people are.

No, there really isn't. There's only one type of Hungarian notation, and that is "superfluous encoding of type information into variable names, where it's never type-checked and routinely out-of-date".

Yes, there really is. Just because you hate it doesn't mean there's not multiple variants, some of which suck slightly less than the others.

DrPizza wrote:

Quote:

"Systems" seems to be the type that everyone uses and hate, which is where the prefix represents the data type. "Apps", on the other hand, you're supposed to encode more so meta information, such as in an app that takes data from potentially-hostile sources, you could use the prefix 'us' to represent 'unsafe string' and 'ss' for 'sanitized string'.

But these are the same thing. The type of unsafe strings should be different from the type of safe strings, so that you can never use an unsafe string in an unsafe way.

Sure, using the language's type system to enforce those things is better. But, sadly, in the real world language designers seem to be running as fast as they can in the opposite direction, to make it as easy as possible to convert between types on the fly.

Google's Go is the only quasi-mainstream language I've used that actually lets you do those sorts of things without stupid, convoluted (thus error-prone) hacks.

Using i in for loops is a classic, but I don't know about the whole i, j, k thing. When you have three nested loops, it might be a good idea to make the variables a bit more descriptive in order to differentiate between them...

But author, if you think this code is hard to read, you should try some of the implicit typed (and variables most all one letter, i, x, y, etc) goto-laden f77 code we physicists often have to work with every day. I could show you scores of routines that have more than 50 goto statements in them. What's perhaps most stunning is that these code bits work robustly and have done so for 25+ years (but woe be unto you if you even contemplate significant changes to such code).

It's not just the variable names, lack of comments and goto statements, it's the entire code formatting. Plus fortran doesn't help things by having the same character for array indexes and function parameters. From one of the codes I needed to get working.

For those that don't read fortran, the " 1+f*grsq(ix,ny) " in the second line, that "1" is being used as a continuing line character. Ignoring the array index ordering, the equivalent in line C would be,

When programming I take the approach that your code should be clear to be maintainable, because some poor bastard needs to understand your code in six months time. Sometimes, that poor bastard is yourself, so you end up benefiting yourself and everyone else for producing code that is easy to understand. Sometimes, due to the nature of the problem, good comments are the only way around some coding implementations.

For cases where a domain specific equation needs to be used, then it is best to limit it to one small function, rather than getting lost in a function where it could be confusing. It also makes it easier to know what that function is all about and makes it easier to reuse.

The weirdest challenges I have had is when the code has multilingual variable names. This can happen when ownership has shifted from teams in different regions. This is part of the reason I believe English should be used, especially since since the programming language (Java in my case), is already using English for its keywords and APIs.

One other other thing is to take the time to understand what is the accepted style for the language you are using. This includes the naming & case conventions for variables and constants or how many functions in functions is too many. It's not because a language allows you to do something, that you should be abusing it.

I'm a computational physicist, and I immediately thought "equation" when I saw the list of variables. Even without domain knowledge it just feels right, including the subscripts. Well, except for the Hungarian notation. I work with a lot of differentials, deltas, uncertainties, etc. and those consume all the initial "d"s I care to give. That said, I'd rather have a naming pattern I don't particularly like used consistently than one I do used inconsistently.

Thus, when I got down to the first answer I burst out laughing. Work with physicists? He or she certainly does. A comment description of any non-trivial equation is still good practice, though. I mean, the radius is a relevant variable in many thousands of equations, why leave someone to guess?

For variables without a straightforward mathematical interpretation, though, I prefer something reasonably descriptive. I often find myself abbreviating the more descriptive term in limited contexts, of course. For example, if I have a list of peak functions called "peakfunctions", in a for-loop or list comprehension an element of that list will often be called "pf". Even if the abbreviation has no standard meaning within the program, if the longer description lurks nearby (and usually when the variable is initialized) I find it an intuitive naming pattern.

Using i in for loops is a classic, but I don't know about the whole i, j, k thing. When you have three nested loops, it might be a good idea to make the variables a bit more descriptive in order to differentiate between them...

This is pretty standard math/physics notation (e.g. the Levi-Civita symbol). It is especially reasonable when the variables all refer to the various dimensions of a single object. If what is being indexed doesn't have a similar interpretation then something more descriptive may very well be a better idea.

Definitely agree, hungarian and abbreviated variable names are my worst nightmare. The ONLY time I've ever seen hungarian be slighty useful is using it for booleans and 'is' (IsOpen) works much better I think.

Although yeah I've used abbreviations in complex formulas. Love using i, j, and k for nested loops too :-p

While I agree with others that longer variable names are fine and useful... the main thing I add to code like this is comments. I have to deal with a lot of stuff written by others without comment. So much time is spent trying to figure out what's actually there that it doesn't make sense to spend all that time without taking a few moments to document what I figure out.

As a computational physicist, a lot of my variable names look like that, and as the top voted post in the article indicates, it is in fact more readable that way. Parity between the literature and your code makes it easy for anybody with the requisite skill set to sit down and immediately know what's going on.

A comment explaining the variable's meanings couldn't hurt, but I'm also not surprised it isn't in there. The original author probably assumed the code would be maintained by somebody trained in optics. I guess that didn't end up being the case, though.

My comments tend to be pretty thorough. If a set of nested loops represents some sums that I can write out, I'll latex them up and stick that in a comment. It may take a quick cut-and-paste for the reader to see clearly what it says, but that often ends up being far more descriptive than plain text.

You both are confusing how you been taught to think in computerese and how the average person thinks. That's what makes programming hard; you have to think in an unnatural way.

I agree, programming is hard. I think many attempts to make it easier have wound up making it harder.

Quote:

If you want to compare two strings, then you do so byte by byte. If you want to compare two words or two phrases, then it's not trivial. Look at an search engine. Comparing text is a difficult task. And you're usually better off using irregular-expression matching.

You don't make it simple through a few tweaks to the type system. A general purpose language has to express logical values and abstract away the machine implementation.

And, typically, when you're working with actually comparing text, your standard library is only going to get you so far. If I am indexing phone numbers, for instance, I might want an n-gram field, whereas names might be matched using a metaphone field. I can't take those and apply them to my string type: the n-grams are going to generate multiple values for each phone number, whereas the metaphone will create an unreadable hash.

And the user pretty much has to know how those algorithms work, at least roughly. That hardness is inherent in the problem being solved, it's not a result of language designers being lazy or bloody-minded.

"I tend to follow Microsoft's conventions on variable names...Do not use Hungarian Notation" - huh? Microsoft is where the Hungarian disease originated! It's infected everything now, but they're the ones who came up with it and promoted it. Like the walking dead, Hungarian variables have spread everywhere, eating the brains of developers who have to deal with them.

Otherwise, there's insufficient information to answer the question. Not sure if these are physics variables (dT/dx) or what.

Oh and i j k as loop index names is entirely acceptable because nearly every programmer in the world learned 'for' loops using them. So if any one of us sees i, j or k in code we instantly believe it's a loop index.

In my opinion, hungarian notation aids in good programming practices and reduces potential bugs. Then again. Trying to use it in these newer programming paradigms can be impossible.

My other opinion is that as long as everyone on a project uses the same naming standards and coding practices it doesn't really matter.

No, there really isn't. There's only one type of Hungarian notation, and that is "superfluous encoding of type information into variable names, where it's never type-checked and routinely out-of-date".

Yes, there really is. Just because you hate it doesn't mean there's not multiple variants, some of which suck slightly less than the others.

No, there really isn't. Both "systems" and "apps" Hungarian are identical: putting type information in variable names. All the type information should be compiler-enforced. All of it is prone to becoming out of date. There's no objective measure for differentiating between "systems" and "apps".

Quote:

Sure, using the language's type system to enforce those things is better. But, sadly, in the real world language designers seem to be running as fast as they can in the opposite direction, to make it as easy as possible to convert between types on the fly.

No, it's not merely "better"; it's the only thing that works. Everything else is a lie.

To be fair, though, the real problem here is lack of indentation and lack of comments, not Fortran per se. For example, although f77 allows any character to be used as a continuation character, a sane coding style will always use the dollar sign $ as the continuation character, because it appears nowhere else in f77 (unless you're calling VMS library functions, in which case "abandon all hope ye who enter here"). And if that continuation line is indented, so much the better.

Most f77 compilers from the mid-'80s on would also handle variable names up to 20+ characters instead of the 6-characters specified by the standard. Another important tool was the use of IMPLICIT NONE to turn off f77's default variable typing and force declaration of all variables. And f77 fully supported if-then-else flow control structures, to cut down on the GOTOs.

Much f77 code was written by people who just had a computation they needed to run, and were not particularly interested in nice code structure until they had to maintain somebody else's spaghetti code. Hence stuff like what Mark showed...

Company standard hungarian? Count yourself lucky you got fired! I hate hungarian but I'd never tell anyone they can't use it... I'd grumble and maybe change it if I had to maintain it, and if I really hated it I'd get a new programmer, but ugh... nobody writes perfect code (timely) and you cant mandate that so why bother...

1. Spell out the most relevant part. If all the variables in your application were to have "Radius" in them, I might decide to abbreviate that to just "R", but then have "r_inner" and "r_outer" for example. Knowing they'll all be radiuses (OK, radii), the most important information to the variable name is /which/ radius it represents.

2. Abbreviate consistently. If you're going to abbreviate (because words like "controller" are long and repetitive to type), decide up front whether it's ctrl or cntrl or cntrlr or ctrlr. Imagine how you'd do a "Find" if you had to. Would you know what to search on?

3. Comment it. The comment ahead of the code should be describing the logic, which should include human-language words spelled out fully for each data item involved. "/* Multiply the inner radius by the outer radius and divide by the input radius plus the standard radius: */" That way if there is any doubt, reading the explanation should shed light on the variable name conventions in use.

4. Standardize it. Style manuals are awesome. They can be "living documents," but they're the fastest way to help new people find their way around your environment, and they're a great arbiter when you have any doubts yourself. It can be a single page, even. But it would say things like, "Do not use Hungarian notation. Use camel case or don't. Use underscores between all words, or between the first and all subsequent words. Initial caps on method names, lowercase variable names. Etc." All you have to do is decide how you're going to do it, document that, and move on. If your style manual is web-accessible (and I imagine it should be, a wiki or Trac or whatever makes this easy), put a link to the style manual in the header comments in your source file.

The point is, make a plan, document the plan, and execute the plan. As you should with anything, really.