Myths about static analysis. The fifth myth – a small test program is enough to evaluate a tool

This is how this statement looks in discussions on forums (this is a collective image):

I’ve written a special program, its size is 100 code lines. But the analyzer doesn’t generate anything although all the warning levels are enabled. This [tool of yours] / [static analysis] in general is just rubbish.

It is not the static analysis methodology which is rubbish, but this approach to evaluating the usability of a particular tool. The incorrectness of this kind of tool studying consists of two aspects:

1. Programmers think they don’t make simple mistakes. This phenomenon was discussed in Myth 2. So they try to feed an analyzer with a tricky sample and feel happy secretly when the analyzer can’t find the error. This game is interesting yet senseless.

You should understand that most errors are simple as hell, and static analyzers detect them very well. The paradox is that it’s much more difficult to invent a simple mistake than a complicated one. Here you are an example. Can you ever guess to write a sample like this?

I doubt. I cannot imagine one can make such a silly mistake and write “sizeof(threadcounts) / sizeof(threadcounts)”. So, such an example will never be created on purpose. By the way, this fragment is taken not from a student’s lab work, but from the Chromium project. It is diagnosed by the PVS-Studio analyzer very easily, of course.

2. Written samples are of random character, and they are few. So you may get very different results depending on chance. You may invent 5 errors that will be successfully found by one analyzer and not found by another analyzer. Or you may create a program with five errors, and two analyzers will give opposite results for it. The sampling for such an investigation is too small. To be able to compare and study tools with at least somewhat reliable results, you must write a program text with at least 500 different errors. An investigation based on 5-10 errors is not reliable.

Moreover, programmers expect to see diagnostic messages on errors of some particular type and forget about the rest. For example, almost all the programmers write one and the same sample with a memory release defect:

Why does nobody write such examples? Note that PVS-Studio has found this error in the MySQL project.

The conclusion is, adequate investigation or comparison of tools can be carried out only with real projects. You take project A, test it with PC-Lint / Visual C++ / PVS-Studio / C++Test, study all the messages attentively, draw up a table of results (how many and which errors each analyzer has found). This is the only real investigation and comparison. For example: “Comparing Analysis Capabilities of PVS-Studio and Visual Studio 2015’s Analyzer“.