Case Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis Tools

View/ Open

Date

Author

Metadata

Statistics

Abstract

Static analysis tools analyze source code and report suspected problems as warnings
to the user. The use of these tools is a key feature of most modern software development
processes; however, the tools tend to generate large result sets that can be hard to process
and prioritize in an automated way. Two particular problems are (a) a high false positive
rate, where warnings are generated for code that is not problematic and (b) a high rate
of non-actionable true positives, where the warnings are not acted on or do not represent
signi cant risks to the quality of the source code as perceived by the developers. Previous
work has explored the use of machine learning to build models that can predict legitimate
warnings with logistic regression [38] against Google Java codebase. Heckman [19]
experimented with 15 machine learning algorithms on two open source projects to classify
actionable static analysis alerts.
In our work, we seek to replicate these ideas on di erent target systems, using di erent
static analysis tools along with more machine learning techniques, and with an emphasis
on security-related warnings. Our experiments indicate that these models can achieve high
accuracy in actionable warning classi cation. We found that in most cases, our models
outperform those of Heckman [19].