What are some practical coding examples of the different factors that go into calculating the complexity? Specifically, for the Wikipedia equation of M = E − N + 2P, I want to better understand what each of the following terms means:

E = the number of edges of the graph

N = the number of nodes of the graph

P = the number of connected components

I suspect that either E or N may be the number of decision points (if, else if, for, foreach, etc) in a block of code, but I'm not quite sure which is which or what the other signifies. I'm also guessing that P refers to function calls and class instantiations, but there isn't a clear definition given that I can see. If someone could shed a little more light with some clear code examples of each, it would help.

As a follow-up, does the Cyclomatic Complexity directly correlate to the number of unit tests needed for 100% path coverage? As an example, does a method with a complexity of 4 indicate that 4 unit tests are needed to cover that method?

I found that you can get the original paper by McCabe from Wikipedia and Google Books will yield the book that McCabe used for his original paper. Interestingly, you will then find that McCabe used the original theorem wrongly (and also explains confusingly as he should start off with an undirected graph and there is no need to make it strongly connected in the first place) but the numbers come out correctly anyway (the correct formula would be M = E+1-N+P, but as P is always 1, it fits...) The thought occurs that modern "Exception handling" throws a spanner into the works of that metric.
–
David TonhoferMay 3 '14 at 10:15

...and what about recursive calls (possibly going via a chain of functions). Does one fuse the functions graphs? How about short-circuiting boolean operators like "&&". Guarded operators like "ref?.x" which yield null if ref is null? Oh well, it's just another metric. But there is some work for a little university project here.
–
David TonhoferMay 3 '14 at 10:22

3 Answers
3

Regarding the formula: nodes represent states, edges represent state changes. In every program, statements bring changes in the program state. Each consecutive statement is represented by an edge, and the state of the program after (or before...) the execution of the statement is the node.

If you have a branching statement (if for example) - then you have two nodes coming out, because the state can change in two ways.

Another way to calculate the Cyclomatic Complexity Number (CCN) is to calculate how many "regions" in the execution graph you have (where "independent region" is a circle that doesn't contain other circles). In this case the CCN will be the number of independent regions plus 1 (which would be exactly the same number as the previous formula gives you).

The CCN is used for branching coverage, or path coverage, which is the same. The CCN equals to the number of different branching paths theoretically possible in a single threaded application (that may include branches like "if x < 2 and x > 5 then", but that should be caught by a good compiler as an unreachable code). You have to have at least that number of different test cases (can be more since some test cases might be repeating paths covered by previous ones, but not less assuming each case covers a single path). If you cannot cover a path with any possible test case - you found unreachable code (although you'll need to actually prove to yourself why it is unreachable, probably some nested x < 2 and x > 5 lurking somewhere).

As to regular expressions - of course they affect, as any other piece of code. However, the CCN of the regex construct is probably too high to cover in a single unit test, and you can assume that the regex engine has been tested, and ignore the expressions' branching potential for your testing needs (unless you're testing your regex engine, of course).

+1: Actually, you must trust that the regex engine has been tested. If you don't trust it, get one that you do trust.
–
S.LottSep 27 '11 at 10:05

"The CCN equals to the number of different execution paths possible in a single threaded application" This is wrong as the CCN is based just on the code's topology not on its meaning. A good percentage of these paths may be impossible to exercise as they demand input state that cannot be set (some x being 5 and also less than 2 for example). Frankly, I think using the CCN to decide on test cases to run is perverse. CCN is a number to tell the developer "you may have gone overboard here, please consider refactoring". And even then, there may be good reason for high CCN.
–
David TonhoferMay 3 '14 at 10:30

1

@David added a sentence to address that. CCN is a branch coverage and there are never good reasons for high CCN at a lower level (generally I suggest enforcing per individual function).
–
littleadvMay 3 '14 at 18:50

Branch coverage and path coverage are not the same. Branch coverage aims at covering all branches whereas path coverage aims at covering all combinations of branches.
–
mouvicielMay 8 '14 at 11:44

For some reason, McCabe indeed uses it in his original paper ("A Complexity Measure", IEEE Transactions on Software Engineering, Vo.. SE-2, No.4, December 1976), but without justifying it and after actually citing the correct formula on the first page, which is

The cyclomatic number v(G) of an (undirected) graph G (which may have
several disconnected components) is defined as:

v(G) = e - v + p

where e = number of edges, n = number of vertices, p = number of
connected components

Theorem (not used by McCabe):

The cyclomatic number v(G) of a graph G is equal to the maximum number
of independent cycles

A cycle is a sequence of vertices starting and ending at the same vertex, with each two consecutive vertices in the sequence adjacent to each other in the graph.

Intuitively, a set of cycles is independent if none of the cycles can be constructed from the others by superimposing the walks.

Theorem (as used by McCabe):

In a strongly connected graph G, the cyclomatic number is
equal to the maximum number of linearly independent circuits.

A circuit is a cycle with no repetitions of vertices and edges allowed.

A directed graph is said to be strongly connected if every vertex is reachable from every other vertex by passing through the edges in their designated direction.

Note that here we passed from undirected graphs to strongly connected graphs (which are directed ... Berge doesn't make this entirely clear)

McCabe now applies the above theorem to derive a simple way to compute a “McCabe Cyclomatic Complexity Number” (CCN) thusly:

Given a directed graph representing the “jump topology” of a procedure (the instruction flow graph), with a designated vertex representing the unique entry point and a designated vertex representing the unique exit point (the exit point vertex may need to be “constructed” by adding it in case of multiple returns), create a strongly connected graph by adding a directed edge from the exit point vertex to the entry point vertex, thus making the entry point vertex reachable from any other vertex.

McCabe now posits (rather confusingly I might say) that the cyclomatic number of the modified instruction flow graph "conforms to our intuitive notion of 'minimum number of paths'", and so we shall use that number as complexity measure.

Cool, so:

The cyclomatic complexity number of the modified instruction flow graph can be determined by counting the "smallest" circuits in the undirected graph. This is not particularly hard to do by man or machine, but applying the above theorem gives us an even easier way to determine it:

v(G) = e - v + p

if one disregards the directionality of the edges.

In all cases, we just consider a single procedure, so there is only one connected component in the whole graph, and so:

v(G) = e - v + 1.

In case one considers the original graph without the added "exit-to-entry" edge, one obtains simply:

ṽ(G) = ẽ - v + 2

as ẽ = e - 1

Let's illustrate by using McCabe' example from his paper:

Here we have:

e = 10

v = 6

p = 1 (one component)

v(G) = 5 (we are clearly counting 5 cycles)

The formula for the cyclomatic number says:

v(G) = e - v + p

which yields 5 = 10 - 6 + 1 and so correct!

The "McCabe cyclomatic complexity number" as given in his paper is

5 = 9 - 6 + 2 (no further explanations are given in the paper as to how)

which happens to be correct (it yields v(G)) but for the wrong reasons, i.e. we use:

ṽ(G) = ẽ - v + 2

and thus ṽ(G) = v(G) ... phew!

But is this measure any good?

In two words: Not very

It is not entirely clear how to establish the "instruction flow graph" of a procedure, especially if exception handling and recursion enter the picture. Note that McCabe applied his idea to code written in FORTRAN 66, a language with no recursion, no exceptions and a straightforward execution structure.

The fact that a procedure with a decision and a procedure with a loop yield the same CCN is not a good sign.

Even less good is the fact that for loops and while loops are handled in the same way (note that in C, one can abuse the for to express a while in another way; here I am talking about the strict for (int i=0;i<const_val;i++) loop). We know from theoretical computer science that these two constructs yields totally different computational powers: primitive-recursive functions if you are only equipped with for, partial μ-recursive functions if you are equipped with while.

As a follow-up, does the Cyclomatic Complexity directly correlate to the number of unit tests needed for 100% path coverage?

Yes, basically. It's also a good idea to make use of cyclomatic complexity as an indicator of when to refactor. In my experience, testability and reusability greatly increase for lower CC (although you should be practical - don't over-refactor, and some methods will have high CC due to their nature - it doesn't always make sense to try and force it lower).

Yes, if you want to be exact, although most code analysis tools don't seem to take them into consideration in that way. Regular expressions are just finite state machines, so I'm guessing their CC could be calculated from the FSM graph, but it would be quite a large number.