I empathise ... It sounds like you have suffered from idiotic applications of misguided rules

No need for empathy. It was a long time ago, and we didn't suffer :) The result of the study, was that we rejected both the tool and the idea.

With respect to your statistics. One set thereof does not a case make.

I would derive two things from my reading of the numbers.

Three modules are heavily over commented.

Large modules are more complicated than small ones.

What I cannot say from those numbers is whether the complexity arises as a result of

the needs of the algorithm required to perform the function of those methods.

because a bad algorithm has been used.

because a good algorithm has been badly implemented.

because the "complex" (read:big) methods do too much.

Indeed, without inspecting the source code, I cannot even tell the accuracy of those metrics.

It could be that "HitTheEdge" contains a recursive algorithm that is extremely complicated to follow and modify, but simple in it's code representation.

Or that "GenMoves" is a huge if/then/else structure that would be better implemented as a dispatch table

Or that "Slide" uses a string eval to replace itself with an extremely complicated subroutine that the source code analyser sees simply as a big string constant.

And that's my point. You are already looking to derive further metrics from the generated metrics, but there is no way to validate the efficacy of those metrics you have, beyond inspecting the code and making a value judgement.

So you already falling into the trap of allowing the metrics to become self-serving, but the metrics themselves are not reproducible, scientific measurements, they are simply "indicator values".

When you measure the length, mass, hardness, reflectivity, temperature, elasticity, expansion coefficient etc. of a piece of steel, you are collecting a metric which can be reproduced by anyone, anywhere, any time. Even if the tools used to make the measurement are calibrated to a different scale, it is a matter of a simple piece of math, or a lookup table to convert from that scale to whichever scale is needed or preferred. This is not the case for any of the metrics in your table.

You don't say what language your program is coded in but I could (probably) take all of your methods and reduce them to a single line. It would be a very long line, but in most languages it would still run perfectly well. What affect does that have on your metrics?

Equally, we could get half a dozen monks (assuming Perl) to refactor your methods according to their own formatting and coding preferences and skill-levels. And even if they all do a bang-up job of making sure that they reproduce the function of your originals--bugs an all--and if we then used the same program to measure their code as you have used, they will all produce different sets of numbers.

And that is the crux of my distaste for such numbers. They are not metrics. They do not measure anything! They generate a number, according to some heuristic.

They do not measure anything about the correctness, efficiency or maintainability of the code that gets run, they only make some guesses, based upon the way the coder formatted his source code.

They do not conform to any standards.

They are not transferable between programmers.

They are not transferable between algorithms.

They are not transferable between programs.

They are not transferable between languages.

They are not transferable between sites.

They are not tranferable between design or coding methods. (OO, procedural, functional etc.).

They are not transferable between assessment methods or tools.

In short, they are not comparable, and you cannot perform math with them.

As proof of this, take a look at your "PieceToString" and "HitTheEdge" methods. They have an equal 'complexity' when measured by the same tool. Is this obvious, or even definable from looking at the source code? If I am given two pieces of steel 10 cms long, even without measuring them with a rule, I can easily tell they are the same length. No such comparison is possible for source code.

The tool has become the only way of comparing source code, and as it does not (and could not) adhere to any standard, all measurements are relative, not absolute. So, unless everyone agrees on which tool/language/coding standards etc. etc. to use, there is no way to compare two versions of the same thing.

That means that in order to make comparisons, you have to implement every possible (or interesting) version of the source code, before you can make any inference about whether any one is good or bad.

And even if you could code every possible implementation of a given algorithm, and could prove that they all produced exactly the same results, and you generated your numbers: What would it tell you?

Should you pick the version with the lowest complexity rating? The shortest? The longest? The one with the highest ratio of comments?

Would you make any choice based on the numbers alone? Or would you have to look at the source code?

If you admit that you would have to look at the source code, then you have just thrown your "metrics" in the bin in favour of your own value judgement.

And if you didn't, then you should publish the formulea by which you are going to juggle all those numbers in order to make your decision. It should make for interesting reading.

You seem to be confused about the purpose of metrics. They are NOT an end in themselves, and they are NOT useful in isolation from the code. In demanding such things, and exaggerating the abuses therein, you are setting up straw men and knocking them down in puffs of rhetoric.

I am anything but confused. On this subject, I am very clear. But okay. I'll play. You explain it to me.

Exactly what use are you going to make of the numbers in your table above?

But be warned! I've been here before. The moment you explain a use of those numbers in your table, you will be making judgements based upon them. And the moment you do that, you are using the numbers to in some way represent the code they are derived from.

If those numbers can only be used in conjunction with the code itself, then what part do the numbers play? What purpose do they serve.

If you answer that question honestly, you'll see that nothing in my post was rhetoric. It is all based upon having been there, and done that, and seen the effects that it has.

By themselves, nothing. But I am (and was) arguing that they are useful as clues that can lead to profitable conclusions.

I would also (and have, recently) argue against the idea that they should serve as commentary on code quality, or heaven forbid, "quality gates" for code review.

So, if they can't be used in isolation, why have them?

I routinely deal with large amounts of source code, and most of it is the product of many many unique minds over many years, combined with an endless treadmill of maintenance and tweaking. Apart from asking the individuals with the knowledge - who have often left the company by this stage - what else can I use to help focus my (our) attention for code review, unit testing, and so on.

Obviously, I can also find clues in reported problems, but isn't that just another metric?

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other