Abstract

Software defects can lead to undesired results. Correcting defects costs 50 % to 75 % of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow.

Notes

Acknowledgments

The authors would like to thank Professor Ahmed E. Hassan from Software Analysis and Intelligence Lab (SAIL) at Queen’s University for his strong support during this work. The authors would also like to thank Professor Daniel German from University of Victoria for his insightful advice. The authors are appreciated for the great help of Mr. Shane McIntosh from Software Analysis and Intelligence Lab (SAIL) at Queen’s University during the improvement of this work. The authors are also grateful to the anonymous reviewers of MSR and EMSE for valuable and insightful comments.

References

Akiyama F (1971) An example of software system debugging. In: Proceedings of the international federation of information processing societies congress, pp 353–359Google Scholar

Mockus A (2009) Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th IEEE international working conference on mining software repositories, MSR’09, pp 11–20Google Scholar