Who is this guy? •
Compiler developer / language designer at Microsoft from 1996 through 2012 • Visual Basic, VBScript, JScript, VS Tools for Office, C# / Roslyn • Static analysis architect for C# at Coverity since January • I will use “we” totally inconsistently • I have no formal background in static analysis • I take an engineering rather than academic approach

The business case for C#
• Productive, successful professional developers who target Microsoft platforms make those platforms more attractive to Microsoft’s customers • Original design goal was “a simple, modern, general- purpose language” • Any language with an 800 page specification is no longer simple, but modern and general-purpose still apply • Understanding developer psychology is key to achieving wide adoption of any developer tool

Conservatism • C# developers hate
breaking changes imposed by tools • Even trivial breaking changes are agonized over • In 11 years and 6 releases C# has never added a new reserved keyword • New keywords are contextual so as to not be breaking • This imposes considerable restrictions on new syntaxes • For example, consider iterator blocks: double yield = 123.4; yield return yield;

Conservatism C# 4.0 added dynamic
dispatch to facilitate interoperability with dynamic languages and “legacy” object models • Enormous MVP community pushback • I will use this feature correctly but my coworkers are going to abuse it and then I’m going to have to fix their god-awful hacked-up code • Anything that makes the compiler less capable of finding bugs is met with skepticism and resistance • Completely redesigned based on early feedback

Error reporting psychology • Dealing
with correct code is literally the smallest problem • “Roslyn” does syntactic analysis of broken code in the time between keystrokes; semantic analysis takes a little longer • Error messages need to be understandable, accurate, polite and diagnostic rather than prescriptive • Let’s take a look at some examples

Error reporting psychology A params
parameter must be the last parameter in a formal parameter list Is this saying: • If there is a params parameter, it must be the last one? or • The last parameter and only the last parameter must always be a params parameter? Or • The last parameter must be a params parameter; if others are as well, that’s fine too? The error is only clear if the feature is already understood

Error reporting psychology Complex operator
+ (Complex x, Complex y) { ... User-defined operator must be declared static and public • This is an example of a prescriptive error done right • The user absolutely positively has to do this to overload an operator • Odds that they were not trying to overload an operator are low

Warnings are harder than errors
• Must infer developers erroneous thoughts • Compiler must be fast • This makes an opportunity for third-party tools • Must be plausibly wrong • A warning for code that no one would reasonably type is unhelpful • Must be able to eliminate warning • And ideally the warning should tell you how • Must have low false positive rate • Encouraging developers to change correct code is harmful • We will return to this point later

What do C# developers want?
Rigidly defined areas of doubt and uncertainty • Static type checking, type safety, memory safety… • … that can be disabled if necessary. • A compiler that infers developer intent… • … with predictable behavior and understandable rules • Actionable errors when inference fails… • …rather than muddling on through and getting it wrong

C# was originally called SafeC
C# throws developers into the “Pit of Success”: • Eliminate unimportant dangerous features entirely • switch fall through • Restrict dangerous features to clearly-marked unsafe code regions • Eliminate implementation-defined behaviours • x = ++x + x++; is well-defined in C# … • …but still a bad idea. • Define common undefined behaviours • Accessing an array out of bounds causes an exception • Mandate compiler warnings There are numerous defects that the Coverity C/C++ analysis checkers detect which are impossible, unlikely, or already warnings in C#. Let’s look at a few dozen. Quickly. These are all defects found by Coverity in C/C++ that are not worth checking in C#…

C/C++ defects inapplicable to C#:
• Local read before assignment • C# rejects programs that use uninitialized locals • Uninitialized fields / arrays • Fields and arrays are automatically zeroed out • Treating a pointer to a variable as a pointer to an array • Rare, must be marked as unsafe • Buffer length arithmetic errors • Strings and arrays know their lengths; checked at runtime • Pointer/integer/char/bool/enum type errors • Not inter-assignable in C# without explicit cast operators

C/C++ defects inapplicable to C#:
• Failure to consistently check error return codes • C# uses exceptions • Accidental sign extension • Either error or warning • Implementation-defined side effect order • Side effect order is well-defined • Statement with no effect • is actually a parse time error in C# • Accidental use of ambiguous names • C# requires that a simple name have a unique meaning in a block

C/C++ defects inapplicable to C#:
• sizeof mistakes • C#’s sizeof operator only takes types • Unintentional switch fall-through • Is an error • Unreachable code • Is a warning • Accidental assignment or comparison of variable to itself • Yep, that’s a warning too • Field never written or never read • Man that’s a lot of warnings • Missing return statement • Is illegal • malloc without free / free without malloc / allocator – deallocator mismatch / use after free • Not needed in a garbage-collected language • Dereferencing an address that lived longer than the storage it refers to • References to variables may not be stored in long-term storage • Accidental use of function pointer • Method group expressions can only be used in strictly limited locations • Overriding errors • The language was designed to mitigate brittle base class failures by default

Defects common to C/C++ and
C# • Copy paste mistakes • Expression contains variables but always has the same result • You checked for null here, you dereferenced without checking there. • Some infinite loops • Dangling else and other indentation issues • Array index out of bounds • Integer overflow • checked arithmetic is off by default • Non-memory resource leaks • Such as forgetting to close a file • Stray semicolons • Swapped arguments • Unused return value • Uncaught exception • Missing or misordered critical sections • Including non-atomic operations inconsistently inside critical sections • And many more! And these are just a few that are common to C and C#; there are a whole host of defects specific to C# programs that we could find statically. Let’s consider the psychological aspects of static analysis tools beyond the compiler.

Developer Adoption is Key •
Soundness is explicitly a non-goal • We don’t want to find all defects or even most defects • We want every defect reported to be a customer-affecting bug • Developers won’t adopt a product that they perceive as making their jobs harder for no customer benefit • Our business model requires adoption to drive renewals • How do developers – who, remember, are using C# because they like a statically-typed language – react to static analysis tools?

Developer psychology WRT analysis tools
• Any change in what defects are reported on the same code over time – a.k.a. “churn” – is the enemy • Randomized analysis is right out, unfortunately • Any improvement to our analysis heuristics can cause unwanted churn • We try to keep churn below 5% on every release

Developer psychology WRT analysis tools
• Responds well to perverse incentives • Hard-to-understand defect reports are easy to ignore • No downside to incorrectly triaging true positives as false positives • Finding defects is hard; presenting evidence that prevents incorrect classification as a false positive is harder • Deep analysis with theorem provers can be worse than shallow analysis with cheap heuristics. • Presenting the result is insufficient; the developer must understand the proof to fix the defect.

Management psychology • The first
time static analysis runs there may be thousands of errors; typical rate is one defect per thousand LOC • Academic answer: rank heuristics • Pragmatic answer: ignore them all • Simply ignore all defects in existing code • Triage and fix defects in new code • “Someday” get around to fixing defects in old code • Why is this so popular? • Old code is in the field. It works well enough. Risk is low. • New code is unproven. It might work, or it might not. Risk is high.

Management psychology • Management actually
pays for the developer tools • And typically has no idea how to use them effectively • Middle management has perverse incentives too • Time, cost and complexity are easily measured; quality is not • “Never upgrade the static analysis tool before release” • Worse tools are better; better tools are worse

Worse is better; better is
worse KnownDefects Time No tool improvements == Management gets bonus Tool upgrades find more defects == Management gets no bonus The fix rate is the same in these two graphs but if the tool improves faster than the fix rate, no bonus.

Good news If you have
a well-engineered product that: • makes good use of theoretical and pragmatic approaches, • finds real-world, user-affecting defects, and • takes developer and management psychology into account Then you can make a positive difference

Conclusion • Theoretical static analysis
techniques are awesome; we can and do use them in industry… • … but doing all that math is actually only one small part of shipping a static analysis product • Understanding developer and management psychology is necessary to ensure adoption of any developer tools • C# was carefully designed to match a target developer mindset • Coverity thinks about developer and manager psychology at every stage in the analysis and overall product design • Research into better ways to present defects would be awesome

More information • Learn about
Coverity at www.Coverity.com • Read “A Few Billion Lines Of Code Later” • Find me on Twitter at @ericlippert • Or read my C# blog at www.EricLippert.com • Or ask me about C# at www.StackOverflow.com