Cyclomatic Complexity

published:
Tue, 13-Dec-2005
|
updated: Wed, 14-Dec-2005

Something that came up in a conversation: what is cyclomatic
complexity (CC)? If I gave you a method, how would you calculate the
CC of the method from scratch? Actually the second question assumes
you know what CC is, so I'm cheating a little. But bear this in mind:
managing complexity is an important part of being a good software
designer and programmer and CC is a measure of complexity.

You're writing some code. Is it simple to understand? Is it complex?
Why is it important to think of your code in such terms? Well, I'd
have to say that, unless you are writing the mythical one-off program,
someone, maybe you, is going to have to read and understand the code
at some stage in order to modify it, or fix it, or add some extra
functionality.

Perhaps you'd say one way to describe the complexity of a method is to
count the number of lines of code in the method. The more lines of
code, the more complex the method is; the fewer lines of code there
are, the simpler it is. Easy, right?

You're left with a problem though. What is a line of code? Does the
syntactical fluff of your language of preference count? (That is, the
begins, ends, braces, and other tokens that
delineate blocks of code in the method.) So, for example, are there
two lines of code in the following Delphi fragment or three?

if (i >= 0) then begin
List.Delete(i);
end;

Well, if you count the end statement there are three, but
the end statement is not really about complexity, it's
there to satisfy the language syntax. Really we should ignore these
pieces of syntactic fluff and just count the statements that are doing
something.

But we're not finished yet. What about statements that, to make the
code easier to read, are continued on several lines? Are they one line
of code, or several, one per continuation line? What about
declarations of variables (the var block in Delphi is an
example)? What about the declaration of the method itself and so on?

If a method is an initialization method for some object it will
probably consist of a dozen or more statements that set properties,
arranged in a sequence. No conditional or looping statements to be
found. Is that more complex than another method that does the same for
an object with fewer properties? Probably, but I'd have to say, in
essence, they're both as complex as each other.

So, I think you can see that taking the lines of code measurement as a
complexity metric is just not good enough. It doesn't adequately
express the complexity of the method in any meaningful way. More
likely it is a very approximate way to describe complexity (methods
that are hard to understand tend to have more lines of code -- or use
deliberately confusing identifiers, perhaps -- than ones that are
easier to understand).

Can we do better?

One better metric was invented by Tom McCabe in, believe it or not,
1976 (Next year, the paper he wrote will be 30 years old and yet,
still, many developers have not heard of either the paper or the
concept.) He called it cyclomatic complexity, and it is a description
of the complexity of the control flow of a program (or a method, in
our case). To put it another way, it's a numerical description of the
complexity of a flowchart: the more decision boxes in a flowchart, the
more complex the control flow.

By analyzing many software projects, both before and after the
software was released, he showed that the reliability of a program is
inversely correlated and the number of errors a program exhibits is
directly correlated to its control flow complexity. In other words,
the more complex a program is, the less reliable it is and the more
errors show up after the program is released.

McCabe's research has been confirmed in later years by other
researchers. His findings have been extended by additional metrics
(such as, the number of local variables coupled with the cyclomatic
complexity is a better indicator of a program's reliability), but the
CC metric is still an extremely good indicator of how "good" a program
is.

Reread the previous two paragraphs and mentally replace the word
"program" with "method". In essence, the reliability (whether we'll
find bugs after it has been used) of a method is inversely correlated
with its CC value. If we write simple methods we'll be less likely to
write buggy ones. Well, d'oh!

So how do you calculate the CC value for a method? McCabe's method was
defined in terms of decision points in flowcharts, but we tend not to
draw those these days. Instead here’s his algorithm for calculating
the CC value for a method defined in terms of the tokens of the
language you use.

Start with 1. All this says is that, if called, the method will
finish; there is at least one path through the method.

Add 1 for each conditional statement or looping statement. (So, add 1
for an if, or a while, or a foreach,
or a ternary operator, etc.)

Add 1 for each AND or OR logical operator used in a condition.

Add 1 for each case in a case or switch statement. If the case/switch
statement doesn't have an explicit default case, add 1.

Add 1 for each catch statement (for Delphi: the
except statement).

That's all. Pretty easy to calculate really. Here's a couple of
examples to illustrate the calculation and to exercise your counting
abilities:

This method has a CC value of 3. Count 1 for the main path, 1 for the
foreach, and 1 for the if.

Anyway, at the end of this little calculation you'll get a value
that's at least 1. If it is exactly 1, then the CC value shows that
there is only one path through the method; it is merely a sequence of
one or more statements. Between 1 and 5, you shouldn't worry
particularly about the method: it's not that complex. If the value
were between 5 and 10, I'd have to say that the method could do with
some refactoring. It's getting a little too complex to understand in
one go. Any method that has a CC value of 10 or more is definitely too
complex and should be refactored.

Put it another way. In essence, the CC value is a measure of the
number of paths through the method and so indicates at least the
number of tests you need to write to properly exercise the method. "At
least", note. There will probably be more. So a method of complexity 10
will require, at a minimum, 10 tests to be written against it to
ensure that all paths are exercised and checked. Which would you
rather: test a method with CC 2, or a method with CC 20?

All this plays nicely into the TDD method of designing and writing
software. Once you've mastered the CC calculation, you'll be able to
intuitively decide whether a method is getting too complex and needs
refactoring. You'll get into the habit of avoiding big case
statements. You'll learn the various design patterns that enable you
to refactor your code into something less complex. You'll become a
better developer. Personally I like methods with a CC value of 5 or
less, generally way less than 5. 1 is pure goodness, 2 or 3 is OK.
With 4 or more I tend to be looking for ways to simplify the method.

Update: 14-Dec-2005

Although in the algorithm I wrote try in the fifth step,
I really meant catch, since it is this statement that implies
a conditional statement ("if an exception is thrown, do this"). A
finally doesn't really mean a conditional branch since it
implies that the code should happen no matter what. McCabe's original
algorithm doesn't really have anything to say about this (it was devised
well before exceptions were invented), and I'm not really sure how I'd
show a try..catch in a flowchart anyway, unless there were a
conditional test after every statement in a try block. I think
that's overkill, otherwise adding a try..catch block would blow
your CC value sky-high very easily.