Using Static Analysis to Detect API Usage Anomalies

Application programming interfaces (API) are heavily used everywhere in most modern programming: they allow developers to reuse functionality and build software faster. Unfortunately, there are no formally-defined specifications for most APIs, and these APIs are complex and non-trivial to use. As such, misuse of APIs are a source of defects and vulnerabilities. There is little automated tools support to help a developer verify correct usage since most tools don’t understand the semantics of the APIs being used. Manually writing rules for all the API functions and building custom checkers in static analysis tools is time consuming and doesn’t scale.

GrammaTech was contracted by the Department of Homeland Security to investigate possible solutions to automate the detection of API usage anomalies. In this ongoing project, GrammaTech analyzes a large body of code to detect API usage patters to discover what APIs existed and how they were used. The large body of code, a “code corpus”, consists of about 7000 C/C++ projects and about 498 million lines of code. Using the rules automatically inferred by this endeavor, GrammaTech was able to use CodeSonar to detect API anomalies in new code.

The analysis of the code corpus is only done occasionally, given its large computation requirement, and the general statistical nature of the code corpus does not change frequently. However, the rules generated automatically can be used in a normal static analysis to uncover API usage issues that were previously undetectable. These new rules can uncover security vulnerabilities, memory leaks and other critical errors. The following video does a great job of explaining the analysis process and provides some example of the API anomaly detection in use: