Big Commit Analysis — Towards an Infrastructure for Commit Analysis

Abstract

Developers commit changes to the code base of a certain project in order to, for instance, fix bugs, add features, or refactor the code. In empirical studies, researchers often need to link commits with issues in issue trackers to audit the purpose of code changes. Unfortunately, there exists no general-purpose tool that can fulfill this need for different studies. For instance, while in theory each commit should serve one purpose, in practice developers may include several goals in one commit. Also, issues in issue trackers are often miscategorized. We present BICO (BIg COmmit analyzer), a tool that links the source code management system with the issue tracker. BICO presents information in a navigable form in order to make it easier to analyze and reason about the evolution of a certain project. It takes advantage of the fact that developers include issue IDs in commit messages to link them together. BICO also provides dedicated analytics to detect big commits, i.e., multi-purpose and miscategorized commits, using statistical outlier detection. In an initial evaluation, we use BICO to analyze bug-fix commits in Apache Kafka, where our tool reports 9.6% of the bug-fixing commits as miscategorized or multi-purpose commits with a precision of 85%. This high precision demonstrates the applicability of the outlier detection method implemented in BICO. A further case study with Apache Storm shows that the precision of detecting multi-purpose commits can vary between projects. In addition, BICO also comes with a built-in metric suite extractor for calculating change metrics, source code metrics and defect counts.