Main menu

Post navigation

Questions from Tuesday’s talk

Thanks to everyone who attended our talk this morning. Apologies for the difficulties with the slides at the beginning. The slide sharing system we were using was being very unresponsive. Tomorrow should go more smoothly.

The talk was recorded; when we have the recording and slides available I’ll post a link here.

Unfortunately we ran a little over – sorry – and did not have time for all the questions at the end. I’ll answer a few of the technical questions that did not get answered right now.

You said that your statistical technique was “if 19 times out of 20 you check the return value of a method call for null, then the 20th where you do not is likely to be a defect”. How do you come up with the figure “19 out of 20”?

We don’t actually use a 95% threshold; that was just a simplified example to get the point across. The actual threshold is not a straight percentage but rather a function that varies the threshold percentage based on how many instances were found. For example: suppose there are three calls to method M(), and two of them have null checks on the returned value. And suppose there are 300 calls to method N() and 200 of them have null checks. Both methods have 67% of the call sites with a check, but I think you’d agree with me that it seems like the one call to M() that is missing a null check is more likely to be a defect than all of the 100 calls to N() that are missing a null check. The function we use biases towards a lower threshold when the number of instances is small.

So how do we actually define the parameters for this function? Before I go on I need to define some terms: a “true positive” (TP) is when we correctly report a real defect. A “false positive” (FP) is when we incorrectly report a defect on correct code. A “false negative” (FN) is when we fail to report a real defect. The trick is to keep the TP rate high and the FP and FN rates low.

To set the default threshold function we do experiments on real-world codebases, vary the function parameters, and analyze the TP, FP and FN rates. The parameters that produce a lot of TPs without many FPs or FNs become the defaults.

We also have “be aggressive” and “be more aggressive” switches that you can throw which decrease the thresholds. And if you want to fiddle with the parameters yourself, there is an option in the analyzer to tweak them as you see fit.

You mentioned that one of the most dangerous defects you detect is inverted lock orderings that can lead to deadlocks. How can we avoid deadlocks?

As I described in the talk, the classic deadlock is when thread 1 takes lock A and thread 2 takes lock B. Then in the body of the lock, thread 1 attempts to also acquire lock B, and thread 2 attempts to acquire lock A. Now both threads are waiting for the other to leave the lock, and they will wait forever. The lock ordering is said to be “inverted”.

Let me describe to you several ways to avoid deadlocks.

Don’t write multithreaded code! Seriously, multithreaded code is in my opinion a bad idea to begin with. Having multiple threads of control in the same program is confusing.

When writing multithreaded code, use the highest-level tools available. Instead of manipulating threads –workers – directly, manipulate tasks, and let the Task Parallel Library take care of figuring out how to best parallelize the work. If you are writing locks then you are writing very low-level threading code; bump it up a few levels of abstraction. Your code will be easier to reason about.

If you must write multithreaded code, don’t write shared-memory multithreaded code. You put locks around data structures because they can be mutated on multiple threads. Well, if there are no such shared data structures then there are no locks, and therefore no nested lock ordering inversions. Instead write code where there is one main thread and many worker threads. If you give a worker thread a problem to solve and that problem can be solved without looking at any shared memory then locks are unnecessary.

If you must write shared-memory multithreaded code at the thread level, then make sure that every lock body does exactly one thing; it shouldn’t call any methods that do anything else interesting involving shared memory. If no lock ever takes a second lock then the lock order cannot be inverted!

If you must write shared-memory multithreaded code and have locks that take other locks, then you have to do a global program analysis that describes every lock body that can take another lock, and come up with a correct “canonical order”; you have to say “lock A is permitted to take lock B, but lock B is not permitted to take lock A”. Then do an analysis of your program to make sure that there is no code path on which the lock order can be inverted. This is really hard, which is why you should avoid this situation in the first place if at all possible.

Your tool could identify code that could be refactored to improve its clarity. Do you automatically detect such code and recommend refactoring?

Not for C#. However we do also make a product called Architecture Analysis which can perform these sorts of analyses on C, C++ and Java. See the linked brochure for details.

When doing code reviews I often find that my colleagues write “bad code smells” such as creating an instance method which does not use instance members and could therefore be static. Does your tool detect these sorts of issues?

Generally speaking, no, though there are a few exceptions. Our primary focus is on finding defects that cause the software to behave incorrectly in a way that your customers could notice because that should be the highest priority class of defects.

You would not delay the release of your product if you discovered at the last minute that a correct, tested, debugged instance method could actually be static, or that a hundred line method could be broken up into two shorter methods that were easier to read. You might delay a release though if you discovered a code path that deadlocked and thereby lost customer data.

“Coding style” defect checkers tend to have a high false positive rate, and if you get a lot of them then you can end up obscuring higher priority defects. Tools like FXCop and StyleCop are better for the sort of issue you describe. As I mentioned during the talk, you can import FXCop analysis results into the Coverity Connect tool and triage them alongside of issues discovered by the Coverity analysis.

Does the analysis tool suggest solutions to the defects it discovers?

Sometimes, but typically only when we have high confidence that the suggestion is correct. A tool which suggests making the wrong change to code is a tool for automatically confusing developers and producing bugs, which is the opposite of what we want![1. These same considerations apply to compiler diagnostics; see my earlier article on that subject for more details.]

For example, in the “cut-n-paste defect” example that Jon and I showed:

The tool will highlight the presumed-cut code, highlight the presumed-pasted code and say “it looks like the code was cut from here and pasted to here; did you intend to change this Y to an X?”

In other cases it is hard to know what the intended code is. In those cases we try to describe the defect clearly without giving advice that could be bad. If the defect is described clearly enough then the developer can figure out what the right fix is.

Does the analysis discover cases where, say, a web page back end is not doing adequate validation on the value of a field?

The Coverity Java Security analysis engine uses both static and dynamic analysis to verify that input validators are doing their job to protect users from SQL injection and cross-site scripting attacks. These features are not available for C#. I hope that they can be made available in a future release but we aren’t discussing feature sets of unannounced future products at this time.

You said that you analyze libraries looking for interesting facts about methods, like “dereferences its argument” or “releases a resource”. Can we create custom models, for instance, to say that method X allocates a resource that must be released by method Y?

The Coverity Java, C and C++ analysis engines support user-created models. The C# analysis does not at this time. Again, this is a feature that I hope can be made available in the future, but we aren’t discussing schedules or feature sets of future releases.

How do you deal with false positives?

Harshly.

We try hard to eliminate common sources of FPs before a customer sees them by analyzing as much real-world code as we can. Of course, it is inevitable that some will slip through. If a customer reports a particular FP pattern that they’re seeing crop up all the time then we treat that as a bug in the product and try to come up with a heuristic that detects the FP without eliminating too many TPs along the way.

Also, once you mark a defect as “FP” or “Intentional” in Coverity Connect, any subsequent analysis that reports the same defect will cause the new defect to be merged with the existing defect, so that it does not get brought to your attention again.

Like this:

Related

3 thoughts on “Questions from Tuesday’s talk”

If I understand the last paragraph correctly, there’s a way to acknowledge a false positive and get it out of your face, without changing code. This is great. I hate analysis tools that require you to leave some annotation or formatted comment in the code to disable a check.

If you use several tools, the code can get littered with multiple different types of mess. If one dev wanting to use a tool messes up the codebase for everyone, that discourages adoption and increases the impact of pilot trials.