Introduction to Static Analysis

Erik Dietrich – Tue, 08/01/2017 - 08:48

I am, admittedly, not an expert in the finer points of marketing strategy. Building a brand, coming up with a design, and picking a name that "pops" fall outside of my main areas of expertise. But even I could probably have come up with something better than "static analysis" as a name for something.

I mean, the name practically exudes boring and unapproachable. Static means unchanging and the dictionary definition actually includes the word uninteresting in the very first definition. And analysis? That doesn't exactly evoke images of dance parties and jet skiing. So when you hear the term static analysis for the first time, I would forgive you feeling instantly drowsy.

In all seriousness, I think the dry, academic sounding nature of the term does indeed deter people from becoming interested, if only mildly. Many software developers I talk to seem to hesitate at the topic as if convinced that a master's degree in computer science is a prerequisite for understanding the topic.

And that's unfortunate because static analysis is an extremely valuable tool in any programmer's tool belt. So let's take an introductory look at the concept of static analysis.

The Definition (Making it Sound Less Boring)

So why do they call it static analysis, anyway? I'll get to that shortly. But first, I'll digress a bit into the ways we reason about and analyze code.

When you write code and then build it (or run it through an interpreter), you're initializing a translation process. The compiler turns your human-readable code into machine executable code. And then, on your command, it executes that code.

Software development thus falls into a certain cadence. Write some code. Then build it. Then run it and see what happens. As a software developer, you do this over and over again until you're finished.

If you want to get a bit formal, you could think of this as dynamic analysis. When you hunt for defects or observe program behavior, you examine it while it runs. What is the runtime behavior?

Static analysis draws its name in contrast. You analyze the code without ever executing it. Compiling the code to execute involves changing its form, making this a dynamic consideration. But with static analysis, you examine it as-is.

Understanding through History

Before I get to some examples of static analyzers, let's clarify a bit by walking through the history of static analysis. Interestingly enough, you could argue that static analysis of programs predated computers. People like Alan Turing reasoned about algorithms and program behavior before technology actually caught up. And, in a way, that makes sense. You'd need to reason about programs before building compilers and interpreters.

Fast forward several decades, and you get so-called first generation static analysis. In the 1970s, C programmers saw the emergence of a utility called Lint. This program would flag "suspicious constructs" in code that could indicate a programmer mistake. But, while Lint survived and lent its name to a whole array of modern tools, it yielded a lot of false positives, limiting its adoption and effectiveness.

The 90s saw a renaissance of static analysis tools with the second generation of static analysis tools. While the first generation made a surface examination of the code, the second generation took advantage of increased processing power to do something called path analysis. This involved predictive reasoning about runtime behavior and generally a deeper treatment of the code.

This proved helpful and it created fewer false positives. But it tended to take so much processing power and cost so much that it also did not find widespread adoption.

In more recent years, we have seen the rise of third generation tools. Using abstract syntax trees, this generation of tools allowed for faithful modeling of the codebase. They began to treat the code as data, and they bring us to the modern state of the art.

The Types of Static Analysis Tools

As we've come to the present, static analyzers have grown more and more diverse. Not all of them are created equal, and not all of them have the same goals. And I don't say this simply because we have so many languages and frameworks. They really do tend to solve different problems.

Let's take a look at some of them. And bear in mind that I'm not trying to create an exhaustive list -- just to capture some common tools.

Cosmetic checkers. These tools check the formatting of your code to see if it conforms with standard formatting or your team's formatting.

Code reviewers/practice checkers. These types of analyzers go a bit deeper and look for common programming mistakes and potential design flaws.

Metrics gatherers. With this type of tool, you get data about your code. This might include things like quantified complexity, average method length, cohesion, etc.

Defect detectors. This has overlap with code reviewers, but with the specific idea that they'll find/predict runtime defects.

Specialized analyzers. These analyzers look for specific issues or categories of a problem, such as security vulnerabilities or potential performance issues.

You should also bear in mind that these categories are not mutually exclusive. In fact, many static analysis tools will actually serve more than one of these purposes and can even blur the lines a bit with their offering.

The point to take away is that the modern static analysis landscape offers a wide variety of capabilities. And that field is only growing.

Incorporating Static Analysis into Your Work

In fact, because of the relatively explosive growth in the field over the last decade or so, the prospect of adoption can seem somewhat daunting. And there are too many possible tools across too many languages for me to give you a meaningful how-to guide in the scope of this post. So I won't try. What I'll do instead is offer a philosophical framework for adopting static analysis tooling.

Whether you realize it or not, your software development efforts involve the constant seeking of feedback. This can come from people, such as QA or from fellow developers reviewing code. Or it can come from programs, such as your unit test runner or your compiler. We take this feedback, incorporate it, make some changes, and hopefully improve.

At its core, static analysis offers you another vector for feedback and improvement. And, often, it short circuits more time-consuming processes. You can check over your code for proper indenting yourself, or you can let a static analyzer do it. You can check for potential null dereferences or you can let an analyzer do it. And so on and so forth.

So you have a relatively simple but powerful framework for incorporating static analysis, but you have to use your imagination a bit. Look at your current software development process, and ask yourself if some of your feedback about code couldn't be automated. If you think it can be, there's a decent chance someone has done it (or at least tried). Static analysis may have an incredibly boring name, but it has an incredibly non-boring potential to make you much more efficient.