Summary
Part II - We let the C.R.A.P. out of the bag.
We explain our thinking behind the first version of the CRAP metric and then unveil the formula.

Advertisement

In Part I, we noted that there’s a lot of crappy code in the world. We also noted that most developers don’t consider their own code to fall into that category. Finally, we talked about the fact that, as long as the application does what it’s supposed to do, most software organizations don’t seem to worry too much about bad code – until the code has to be maintained or enhanced and the original developers have skipped town.

From a developer’s point of view, crap is other people’s code that they have to maintain. Unfortunately, most developers spend most of their career working with code that they have not written themselves – code that they consider crappy because they find it nasty, ugly, convoluted, hard-to-understand, and hard to work with.

Crappy code is a big problem for software organizations; so my AgitarLabs’ colleague, Bob Evans, and I decided to do some research to see if the level of crappiness in the code could be analyzed and quantified in some useful form; which brings us to the C.R.A.P. index.

The C.R.A.P. (Change Risk Analysis and Predictions) index is designed to analyze and predict the amount of effort, pain, and time required to maintain an existing body of code.

Below is some of our thinking behind the C.R.A.P. index:

[] We believe that software metrics, in general, are just tools. No single metric can tell the whole story; it’s just one more data point. Metrics are meant to be used by developers, not the other way around – the metric should work for you, you should not have to work for the metric. Metrics should never be an end unto themselves. Metrics are meant to help you think, not to do the thinking for you.

[] We believe that, in order to be useful and become widely adopted, a software metric should be easy to understand, easy to use, and – most importantly – easy to act upon. You should not have to acquire a bunch of additional knowledge in order to use a new metric. If a metric tells you that your inter-class coupling and coherence score (I am making this up) is 3.7, would you know if that’s good or bad? Would you know what you need to do to improve it? Are you even in a position to make the kind of deep and pervasive architectural changes that might be required to improve this number?

[] We believe that the formula for the metric, along with various implementations of the software to calculate the metric should be open-source. We will get things started by hosting a Java implementation of the C.R.A.P. metric (called crap4j) on SourceForge.

[] The way we design, develop, and deploy software changes all the time. We believe that with software metrics, as with software itself, you should plan for, and expect, changes and additions as you gain experience with them. Therefore the C.R.A.P. index will evolve and, hopefully, improve over time. In that spirit, what we present today is version 0.1 and we solicit your input and suggestions for the next version.

[] We believe that a good metric should have a clear and very specific purpose. It should be optimized for that purpose, and it should be used only for that purpose. The more general and generic a metric is, the weaker it is. The C.R.A.P. index focuses on the risk and effort associated with maintaining and changing an existing body of code by people other than the original developers. It should not be abused or misused as a proxy for code quality, evaluating programmers’ skills, or betting on a software company’s stock price.

[] Once the objective for the metric is established, the metric should be designed to measure the major factors that impact that objective and encourage actions that will move the code closer to the desired state with respect to that objective. In the case of C.R.A.P., the objective is to measure and help reduce the risks associated with code changes and software maintenance – especially when such work is to be performed by people other than the original developers. Based on our initial studies and research on metrics with similar aims (e.g., the Maintainability Index from CMU’s Software Engineering Institute) we decided that the formula for version 0.1 of the C.R.A.P. index should be based on method complexity and test coverage.

[] There are always corner cases, special situations, etc., and any metric might misfire on occasion. For example, C.R.A.P. takes into account complexity because there is good research showing that, as complexity increases, the understandability and maintainability of a piece of code decreases and the risk of defects increases. This suggests that measuring code complexity at the method/function level and making an effort to minimize it (e.g. through refactoring) is a good thing. But, based on our experience, there are cases where a single method might be easier to understand, test, and maintain than a refactored version with two or three methods. That’s OK. We know that the way we measure and use complexity is not perfect. We have yet to find a software metric that’s right in all cases. Our goal is to have a metric that’s right in most cases.

Now that you have some insight into our thinking, beliefs, and preferences, it’s time to unveil version 0.1 of C.R.A.P. for Java.

Given a Java method m, C.R.A.P. for m is calculated as follows:

C.R.A.P.(m) = comp(m)^2 * (1 – cov(m)/100)^3 + comp(m)

Where comp(m) is the cyclomatic complexity of method m, and cov(m) is the test code coverage provided by automated tests (e.g. JUnit tests, not manual QA).
Cyclomatic complexity is a well-known, and widely used metric and it’s calculated as one plus the number of unique decisions in the method. For code coverage we uses basis path coverage.

Low C.R.A.P. numbers indicate code with relatively low change and maintenance risk – because it’s not too complex and/or it’s well-protected by automated and repeatable tests. High C.R.A.P. numbers indicate code that’s risky to change because of a hazardous combination of high complexity and low, or no, automated test coverage to make sure you have not introduced any unintentional changes. We’ll cover in much more detail how the C.R.A.P. score should be interpreted and applied in Part III.

As the code coverage approaches 100%, the formula reduces to:

C.R.A.P.(m) = comp(m)

In other words, the change risk is linearly coupled with the complexity. We had a lot of heated discussions about this. Some very smart people we talked with believe that if you have 100% test coverage, your risk of change should be 0. Bob and I, however, believe that more complex code, even if it’s fully protected by tests, is usually riskier to change than less complex code. If you have 100% test coverage and your C.R.A.P. index is still too high, you should consider some refactoring (e.g., method extraction).

As the code coverage approaches 0%, the formula reduces to:

C.R.A.P.(m) = comp(m)^2 + comp(m)

In other words, if you have no tests, your change risk increases, roughly, as the square of the method complexity. This indicates that it’s time to write some tests.

Generally speaking, you can lower your C.R.A.P. index either by adding automated tests or by refactoring to reduce complexity. Preferably both (and it’s a good idea to write the tests firsts so you can refactor more safely).

We believe that unit/developer testing and refactoring are good practices and since either of these activities improves the C.R.A.P. index, we believe that we are on the right path – though we know there is still a LOT of room for improvement.

Software metrics have always been a very touchy topic; they are perfect can-of-worms openers and an easy target. When we started this effort, we knew that we’d be in for a wild ride, a lot of criticism, and lots of conflicting opinions. But I am hopeful that – working together and with an open-source mindset – we can fine tune the C.R.A.P. index and have a metric to will help reduce the amount of crappy code in the world.

OK. Time for some feedback – preferably of the constructive type so that C.R.A.P. 0.2 will be better than C.R.A.P. 0.1.

In Part III, I am going to spend some time discussing how to interpret and use the C.R.A.P. index. Is there a threshold where a piece of code turns into crappy code? If some code will never be changed, do you care what its C.R.A.P. index is? We are going to explore these and other interesting questions. I hope you come back for it.

(1) Does the test coverage itself add to the complexity measurement? That is to say, is it true that the more unit tests embedded in the source then the more complex the code is.

(2) Should 'volume' of code be incorporated in the complexity measurement? The Cyclomatic complexity measure means that if a block of code has no branches/loops then it's measure is 1, regardless of how many actual statements that block contains. But for a C.R.A.P. index that seeks to measure the cost of maintenance, the more code one has, the greater the chance of screwing up some of it during maintenance.

> (1) Does the test coverage itself add to the complexity> measurement? That is to say, is it true that the more unit> tests embedded in the source then the more complex the> code is.

Although it's true that tests have to be maintained and kept up to date, they don't add risk, but reduce it. So the test code is not included in the C.R.A.P. calculation.

I am not sure what you mean by "embedded in the source". I am not a big fan of putting the tests for a module (e.g. a class in Java) in the same file, or the same source directory. Typically, if I have a class Foo.java, I have a JUnit test class called FooTest.java in a testsrc directory. This allows us to differentiate between code-under-test and tests and exclude the latter from the C.R.A.P. calculation.

> (2) Should 'volume' of code be incorporated in the> complexity measurement? The Cyclomatic complexity measure> means that if a block of code has no branches/loops then> it's measure is 1, regardless of how many actual> statements that block contains. But for a C.R.A.P. index> that seeks to measure the cost of maintenance, the more> code one has, the greater the chance of screwing up some> of it during maintenance.

We considered including volume of code in addition to complexity. We also considered include # of parameters, # of fields, etc.

There are many things we could include, but we decided to start small and add new components based on input (such as yours - thank you) and experience with the metric, rather than start with lots of components off the bat.

We believe that convoluted code (i.e. lots of decision points, loops, branches - i.e. high cyclomatic complexity) presents more of a problem for understanding and maintainability than "straight-line" code - and convoluted methods seem to be more common than methods with too many lines of code. Having said that, we have seen some very long Java methods where understandability would be greatly increased by having, say, 4 methods with descriptive names and 10 lines of code (LOC) each, instead of one method with 40 LOC.

Thank you for the feedback and input.

Alberto

PS Volume of code (Halstead Volume and/or plain LOC) are two the top candidates for inclusion. They are already key components in the CMU SEI Maintainability Index (see below):

> So, the definition of good code is simply that it consists> of small methods with unit tests?

No. No. No.

Please read the blog entry more carefully.

I haven't said anything about code "goodness". I actually wrote that this index should not be used to judge things such as code quality:

"We believe that a good metric should have a clear and very specific purpose. It should be optimized for that purpose, and it should be used only for that purpose. The more general and generic a metric is, the weaker it is. The C.R.A.P. index focuses on the risk and effort associated with maintaining and changing an existing body of code by people other than the original developers. It should not be abused or misused as a proxy for code quality, evaluating programmers’ skills, or betting on a software company’s stock price."

To re-iterate:

The C.R.A.P. index has one purpose, and one purpose only: to help predict the risk, effort, and pain associated with maintaining an existing body of code (particularly when the original developers are no longer around).

I've seen code where there were way too many methods. Essentially each method contained one statement. Furthermore, in many cases conditionals were avoided by over-abundant use of the factory method.

Not being familiar with the formal definition of cyclomatic complexity, I'd want that the cyclomatic complexity account fordynamic method dispatch In other words, within a method, every method call that is not precisely statically resolvable counts as a "decision point". Is this already the case?

To deal with the "lots of one-line methods" complexity, simply include some measure of the number of methods in the class in the C.R.A.P measure. Maybe for a given total class cyclomatic complexity, you're allowed some number of methods "for free", but you're penalized as you exceed that.

I don’t think that LOC would add much to that indicator. I am a big fan of cyclomatic complexity metric. I think it is probably the most useful metric of code complexity. Since it directly measures the number of linearly independent paths through a program it also gives you the minimum number of tests you have to write to achieve 100% coverage.

I also really like your formula, but contrary to your smart colleagues I almost feel that the metric should be higher than linear with full test coverage, somewhere in between linear and quadratic, because although you might be able to change things without breaking anything, doing so will be pretty painful. Then again, if the risk doesn’t include the developers despair but only the correctness of the change, linear is probably good.

> To deal with the "lots of one-line methods" complexity,> simply include some measure of the number of methods in> the class in the C.R.A.P measure. Maybe for a given total> class cyclomatic complexity, you're allowed some number of> methods "for free", but you're penalized as you exceed> that.> > As a naive proposal, consider> > methodCountPenalty = min(0, numMethods -> classTotalComp())> > perhaps with an exponential backoff.

Hi Jay,

Thanks a bunch for the suggestion (I especially appreciate the first draft for a formula). One of the things we talked about a lot, and will probably tackle in the near future, is class-level complexity and other unsavory class-level practices.

I know that there will be exceptions (e.g. having 50 methods in a StringUtil class might be OK - even the right thing to do). But, just out of curiosity and for a data point, what would you consider to be an acceptable threshold for # of methods in a class?

> I don’t think that LOC would add much to that indicator. I> am a big fan of cyclomatic complexity metric. I think it> is probably the most useful metric of code complexity.> Since it directly measures the number of linearly> independent paths through a program it also gives you the> minimum number of tests you have to write to achieve 100%> coverage.

Hi,

I can see that you have a better understanding and appreciation of cyclomatic complexity than most developers :-).

> I also really like your formula, but contrary to your> smart colleagues I almost feel that the metric should be> higher than linear with full test coverage, somewhere in> between linear and quadratic, because although you might> be able to change things without breaking anything, doing> so will be pretty painful. Then again, if the risk doesn’t> include the developers despair but only the correctness of> the change, linear is probably good.

I think we should probably take into account developer despair :-). A desperate developer is not the right frame of mind for doing complex/precision work.

I have the same intuition as yours that the right exponent for the (+ comp(m)) component should be somewhere between linear and quadratic - and that complexity should always be factored in even if you have great coverage.

We are currently doing a lot of in-house experiments with the first iteration of the crap4j calculator. We'll publish some concrete code examples soon so we can all test our opinions and intuition while looking at actual code.

> One thing that jumps out at me is the term 100% test> coverage. This is a fairy tale. It doesn't exist.

Hi James,

I agree with you. I have yet to see a non-trivial project with 100% coverage. On top of that, we need to be very careful about turning code coverage into THE OBJECTIVE, instead of using it as a tool. It's very easy to lose track of the fact that software metrics are tools to help you reach objectives and they are not themselves the objectives.

People also need to understand that 100% code coverage - if it existed - does not guarantee that you'll catch any bugs (e.g. you can have tests with perfect coverage and no assertions to check on what actually happens).

On the other hand, 0% coverage, guarantees that you will NOT catch any bugs (not with the non-existing tests any way - perhaps with manual QA at some later stage in development).

Having said that, it's possible and quite common to achieve, or at least approach, 100% code coverage at the method level .

> Hi Vincent,> > > So, the definition of good code is simply that it> consists> > of small methods with unit tests?> > No. No. No. > > Please read the blog entry more carefully. > > I haven't said anything about code "goodness". I actually> wrote that this index should not be used to judge things> such as code quality:> > "We believe that a good metric should have a clear and> very specific purpose. It should be optimized for that> purpose, and it should be used only for that purpose. The> more general and generic a metric is, the weaker it is.> The C.R.A.P. index focuses on the risk and effort> associated with maintaining and changing an existing body> of code by people other than the original developers. It> should not be abused or misused as a proxy for code> quality, evaluating programmers’ skills, or betting on a> software company’s stock price."

Ah, Ok. It makes a lot more sense when you say that a single metric measures a single property of the code. Unfortunately, the word "crap" as a description of code is such a very general term that it masked that.

> The C.R.A.P. index has one purpose, and one purpose only:> to help predict the risk, effort, and pain...

Er, that would be three things. In particular measuring risk of change and effort of change seem very different. I'm not sure what is meant by pain here.

Again, two things appear to be being conflated. Complexity and safety. I can imagine situations where making code safer might require added complexity.

> If you don't think that this is the case, please feel free> to give us suggestions and recommendations for improving> the metric.

I think the only way to improve the metrics is to measure them for real. In the formula you gave, you implied that "crapness" was proportional to the square of the complexity and the cube of the proportion of the code not covered by automated tests. Do these suspiciously precise ratios have any empirical basis or are they just plucked out of the air?

I applaud your motivation for seeking to measure code quality, however, I do think you need to be a lot more precise in defining what it is that you are seeking to measure and, when you present mathmatical formulae, you must be able to justify the terms in some way.

> Having said that, it's possible and quite common to> achieve, or at least approach, 100% code coverage at the> method level .

If we define '100% coverage' to mean every line is tested, then yes it exists. My point is that '100% coverage' is a bogus and misleading term. Above you write "does not guarantee that you'll catch any bugs". I'd rephrase this to say, it doesn't guarantee you'll catch any bug that exists. It's impossible to write tests that will validate every possible state for any non-trivial application.

I guess I'm mostly appalled that there are (ostensibly intelligent) people who believe that there is no risk to changing code just because every line is run by a test. I applaud you for holding your ground on this.

> I think the only way to improve the metrics is to measure> them for real. In the formula you gave, you implied that> "crapness" was proportional to the square of the> complexity and the cube of the proportion of the code not> covered by automated tests. Do these suspiciously precise> ratios have any empirical basis or are they just plucked> out of the air?

I think this is covered in the original post with this sentence:

"Cyclomatic complexity is a well-known, and widely used metric and it’s calculated as one plus the number of unique decisions in the method. For code coverage we uses basis path coverage."

To me, crap code is code with a high degree of change sensitivity, or chaos. It means I have a hard time determining just what might be affected by changing a method or class. Things like global variables, thread locals, etc, increase the chaos, number of methods, number of classes, class dependency cycles, replication of constants, etc all contribute to this.

The cyclomatic complexity of a particular method is one small measure here. However, the crapfulness of a codebase is far more than the sum of the individual crapfulnesses of the various methods, or classes. I really think to be an accurate measure of the change risk, you're going to need to find more holistic metrics.