With modern compilers, the object binary code could well end up the same for both examples, perhaps even inline the routine, and in fact, the debug code might optimize out variables to the point that you cannot examine intermediate values in the debugger. It is handy to be able to write expansive, verbose code knowing that it will get streamlined by the compiler (I remember when this was not always the case). My question is whether there is some kind of broad consensus as to how lines if code are counted.

(I realize that Linux, in its original form, was written with heavy use of asm{ … }, so the count of lines was probably rather high relative to the contemporary reality.)

With modern compilers, the object binary code could well end up the same for both examples, perhaps even inline the routine, and in fact, the debug code might optimize out variables to the point that you cannot examine intermediate values in the debugger. It is handy to be able to write expansive, verbose code knowing that it will get streamlined by the compiler (I remember when this was not always the case). My question is whether there is some kind of broad consensus as to how lines if code are counted.

(I realize that Linux, in its original form, was written with heavy use of asm{ … }, so the count of lines was probably rather high relative to the contemporary reality.)

A quote of a previous boss: "Any not completely incompetent programmer can double their productivity according to any performance metrics, without any increase in productivity. "

It's four lines and two lines. Comments count when you count lines of code.

What _actually_ counts is how much the code achieves. That's the value. The actual code is not a benefit, it is cost. It needs to be examined when you look for bugs, it needs to be modified when specs change, however you look at it, every line of code written is cost.

That's worst when someone uses "copy + paste" to create code. Take a thousand line function, copy it, change the name, change two lines, and the inexperienced developer or manager thinks they just created lots of value. What the actually did is two lines worth of value, minus 1000 lines worth of cost.

My question is whether there is some kind of broad consensus as to how lines if code are counted.

No, there isn't. Some just simply count the number of lines in every source file, some don't count blank lines, some don't count comments, some just count the number of lines with a ";" in it (assuming the language is one that requires ";" at the end of lines like C), some count the total number of ";" (to count single lines with multiple statements as multiple lines), etc.

In the end, it doesn't really matter. That metric is only useful in the sense that it allows you to get a ballpark figure of about how must code there is. Honestly, the actual number is pretty useless.

When I'm doing this, I tend to just run every file through "wc -l" and be done with it. Even then, I only do it when I'm curious to see the order of magnitude. In other words, am I dealing with hundreds, thousands, millions, etc. I don't care about the actual number.

What _actually_ counts is how much the code achieves. That's the value. The actual code is not a benefit, it is cost. It needs to be examined when you look for bugs, it needs to be modified when specs change, however you look at it, every line of code written is cost.

In the PBS documentary Triumph of the Nerds, Steve Balmer of Microsoft spoke about their partnership with I.B.M. The I.B.M. team would keep talking about KLOCS (Thousand Lines of Code) as a good thing. The more, the better. Steve thought this was nuts. Read his response at the Wikipedia entry. BTW, Triumph of the Nerds is a great documentary and the book it's based on digs much deeper into the stories behind Silicon Valley and the PC revolution.

As for a standard to count the lines of code. I've never heard of one in the 25 years I've been in the computing business. The only reason to count lines is to answer ones curiosity.

When I'm personally curious I may do what mslide mentioned or I may add some extra passes to remove blank lines and comment only lines. I think the last time I did this was several years ago to compare a system I had worked on for 9 years to the original code when I walked in the door. I was curious how much I added. Added, because we did add to the system. In some cases I had ripped out chunks of code to reduce individual source files.

One of my greatest achievements working in software was culling thousands, maybe 10s of thousands of lines of unused code from a system. The president of the company told me this was not worthwhile, because there "are no bugs in code that doesn't run". The truth is that there are bugs, but you don't know if they'll ever be run or not, so you don't know if you should fix them. Our build was faster, our greps were faster, and it was easier to follow what was going on.

Other instances of reducing LoC involve removing duplication, reducing complexity, increasing modularity, etc. is a joy. One way to look at LoC is how expensive it is to maintain and enhance a system.

The way I count how many lines of code there are is I scroll to the bottom and look at the gutter on the left side. Bam. Number of lines of code. I don't allow code to go beyond the 80th column.

I try to get between 50 and 500 lines of code in each file. If there are fewer, it suggests that the file could be merged with others, if there is much more then it's probably time to refactor into multiple files.

It helps ensure my code is easy to read. I used to allow myself to put thousands of lines in each file and to use however many columns, but I've come to realize that it makes reading the code and interpreting it suck.

Also, don't copy and paste blocks of code. If you're tempted to, it's probably a better idea to cut that block, paste it into its own function, and use that function multiple times.

I try to get between 50 and 500 lines of code in each file. If there are fewer, it suggests that the file could be merged with others, if there is much more then it's probably time to refactor into multiple files.

Sorry, that approach does not make sense to me, primarily because I mostly use Objective-C. A file should be logically consistent, so adding functions from another file, IMHO, should only be done if there is a sort of theme that makes them go together. A file with 20 or 30 lines of code should be fine if the function(s) "belong together", making files make sense is more important to me than making them adhere to an arbitrary size standard. And, of course, in Objective-C, you sometimes subclass an object and only add one or two methods/overrides, but adding other stuff to the .m file would only facilitate confusion.

I read somewhere that Linux (the kernel) has upwards of 1.7 million lines of code, which got me to wondering if there is some kind of standard for what constitutes a line of code.

As some one said, "The problem with standards is that there are so many of them."

Yes there are lots of ways to count code. As it turns out one is just as good as any other as long as you always count using the same method. It other words "LOC" is a realative measure. It should NEVER be used as an absolute unit.

For example you find it cost you $1,000 to write 100 lines. Count them any way you like. But the only use of counting is so that next time you might know who much it might cost to write (say) 125 lines. You could then guess $1,250.

Using as an absolute is pointless, So saying Linus has 1,000,000 lines is of no use untill to compare it with something else that was counted the same way

What I always did was eliminate comments then simply count semi-colons. That works as well as anything else. Some cound every end of line character. and other remove blank lines.

SOme count "code volume" and try to assign a complexity value to each line so "a = b;" counts as 1 but "if(a<b){ " counts higher.

I've found after doing this for years that none are very accurate and counting semicolons works well enough.

The only good motivation for this is cost estimates. We look at how many lines other projects used andwhat they cost per line. There are better ways. Google "COCOMO" It was a decent approach but sill only good enough for a rough order of magnitude. SOme of the projects I worked on also have 1M lines, some as small as 50K lines. Estimating was never good. What really happens is you write code until you use up the budget. If the budget was big the customer got some realy nice error handling and testing. If the budget was small he got some rather limited robustness.

Sorry, that approach does not make sense to me, primarily because I mostly use Objective-C. A file should be logically consistent, so adding functions from another file, IMHO, should only be done if there is a sort of theme that makes them go together. A file with 20 or 30 lines of code should be fine if the function(s) "belong together", making files make sense is more important to me than making them adhere to an arbitrary size standard. And, of course, in Objective-C, you sometimes subclass an object and only add one or two methods/overrides, but adding other stuff to the .m file would only facilitate confusion.

Breaking up large files, otoh, is usually a good idea, when possible.

Obviously I do whatever makes for the best code design - I wouldn't randomly merge things together - but I just consider it to be a code smell if you have dozens of files with just one or two functions in each of them. It's not necessarily a problem, but it's a simple indicator that my design choices may not be as good as they should be.

We had a very short lived metric at work around lines of code. We had a web-app at the time that had lots of graphics. We counted each graphic as so many lines based on the old quote "a pictures is worth a thousand words"

In the big scheme of things, it's another meaningless management metric.

In the small scheme of things, for a developer or team of developers who use a very similar coding and commenting style, and stick to code that is very uniformly necessary to solving the problem (a big if), it's a slightly more objective and in some case more accurate measure than asking the average developer for their subjective opinion ("yup, it's 90% coded" when only 10% of the way toward the first alpha).

Some coders are decent or even good at estimation, most aren't. In that case a bad metric may be better than a worse metric.

It seems to me that the resulting assembly code is much more worth talking about how long it is than the higher level code... high level code could change into a single assembly instruction or it could change into dozens of assembly instructions. More instructions means the code takes longer to execute (ignoring the variance in how long different assembly instructions take... I feel like that variance is a lot less than the variance between lines in a higher level language.)

Of course... loops... function calls... maybe the number of instructions that need to be executed is more worth talking about than the number of instructions.

It seems to me that the resulting assembly code is much more worth talking about how long it is than the higher level code... high level code could change into a single assembly instruction or it could change into dozens of assembly instructions. More instructions means the code takes longer to execute (ignoring the variance in how long different assembly instructions take... I feel like that variance is a lot less than the variance between lines in a higher level language.)

Of course... loops... function calls... maybe the number of instructions that need to be executed is more worth talking about than the number of instructions.

C++ with tons of inlined functions, where some developer puts the whole code into the header file, leading to an explosion of assembler code. Or template code. Every little sort operation in C++ generating code for the complete sort algorithm.

----------

Quote:

Originally Posted by lee1210

One of my greatest achievements working in software was culling thousands, maybe 10s of thousands of lines of unused code from a system. The president of the company told me this was not worthwhile, because there "are no bugs in code that doesn't run". The truth is that there are bugs, but you don't know if they'll ever be run or not, so you don't know if you should fix them. Our build was faster, our greps were faster, and it was easier to follow what was going on.

Let's say there is a function that you think needs to change its behaviour. So you examine who calls it and if the callers are affected. And three of five callers are in dead code that is never executed. So you waste hours figuring out why this dead code uses this function in a weird way (and figure out it is because the code is dead, and wouldn't work anymore because it wasn't maintained). So you wasted your time on that dead code.

We had a very short lived metric at work around lines of code. We had a web-app at the time that had lots of graphics. We counted each graphic as so many lines based on the old quote "a pictures is worth a thousand words"

I've been working on a project here for 6 months. When I started working on it, it was 47,000 lines of code (just measured by wc -l).

Now, after 6 months of work, it's at 23,000. Just under half what it was. Performance is now about 1.5x what it was for the vast majority of it, and its maintainability has been increased drastically.

So that equals to -4000 lines of code per month, if you measure my productivity in LOC.

Wrong measure. You need to average that with all the coding and months that had been done before you. It's like the standard 1 line of code per day being average productivity. What that really means is someone writing a few hundred lines before lunch, and then spend the rest of the year in meetings, doing specs, fixing bugs, throwing it away and rewriting it again because of a change in requirements, or a bug requiring a new architecture to fix, more meetings, more reviews, etc. Average those hundreds of LOC in the AM and you end up with 1.5 LOC/day by corporate project end of life.

Wrong measure. You need to average that with all the coding and months that had been done before you. It's like the standard 1 line of code per day being average productivity. What that really means is someone writing a few hundred lines before lunch, and then spend the rest of the year in meetings, doing specs, fixing bugs, throwing it away and rewriting it again because of a change in requirements, or a bug requiring a new architecture to fix, more meetings, more reviews, etc. Average those hundreds of LOC in the AM and you end up with 1.5 LOC/day by corporate project end of life.

Why? If that is what you are interested in then why not just talk about the size of the executable. Lines of code is just a rough estimate on the magnitude of a project.

Size of the executable can grow by adding functionality, by hiring clumsy programmers who write inefficient code, by using language features that lead to code explosion. (Recent personal experience: By using graphics designers who can't use their tools and turn a simple icon into a 50 KB file).

Size of the executable can grow by adding functionality, by hiring clumsy programmers who write inefficient code, by using language features that lead to code explosion. (Recent personal experience: By using graphics designers who can't use their tools and turn a simple icon into a 50 KB file).

Absolutely, or by using (or not) dynamic libraries, highly optimized code with loop unrolling or the fact that x86 uses a variable size instruction set. I simply put in question why assembly instruction would be more useful as a measure than lines of code for the purpose of getting a ballpark figure on project size.

"As a Real World Programmer. You won't have to write more than 3 lines of code"

is this true in any way?

Given that "As a Real World Programmer." is not a complete sentence, it is hard to assess. What I suspect the professor might have been trying to say is that there are only three distinct lines of code that make up the bulk of a program, that you will write them many thousands of times with slight variations.

"As a Real World Programmer. You won't have to write more than 3 lines of code"

I think some large government or aerospace project took the actual number of lines of code in the finished product and divided by the man years of salary and contract time that they had paid for since the project started, and it came out somewhere between 3 and 10 lines of code per day. That's what happens when software teams have to spend 90%+ of their time in multiple meetings, doing specs and project charts and reviews and process documentation and etc.