"... Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatic ..."

Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatically extract such rules and also to automatically detect violations. Previous work in this direction focuses on simple function-pair based programming rules and additionally requires programmers to provide rule templates. This paper proposes a general method called PR-Miner that uses a data mining technique called frequent itemset mining to efficiently extract implicit programming rules from large software code written in an industrial programming language such as C, requiring little effort from programmers and no prior knowledge of the software. Benefiting from frequent itemset mining, PR-Miner can extract programming

...dence. Because several functions may have the same violation, the potential bugs in these functions are strongly correlated. Therefore, some other advanced ranking schemes such as correlation ranking =-=[17]-=- can be used here to further improve the accuracy of our ranking function, which remains as our future work. 4. EVALUATION 4.1 Experiment Setup We have evaluated PR-Miner with the latest versions of L...

"... Benchmarking provides an effective way to evaluate different tools. Unfortunately, so far there is no good benchmark suite to systematically evaluate software bug detection tools. As a result, it is difficult to quantitatively compare the strengths and limitations of existing or newly proposed bug d ..."

Benchmarking provides an effective way to evaluate different tools. Unfortunately, so far there is no good benchmark suite to systematically evaluate software bug detection tools. As a result, it is difficult to quantitatively compare the strengths and limitations of existing or newly proposed bug detection tools. In this paper, we share our experience of building a bug benchmark suite called BugBench. Specifically, we first summarize the general guidelines on the criteria for selecting representative bug benchmarks, and the metrics for evaluating a bug detection tool. Second, we present a set of buggy applications collected by us, with various types of software bugs. Third, we conduct a preliminary study on the application and bug characteristics in the context of software bug detection. Finally, we evaluate several existing bug detection tools including Purify, Valgrind, and CCured to validate the selection of our benchmarks.

...re bug detection tools are also not standardized. Some work evaluated only the execution overhead using SPEC benchmarks, completely overlooking the bug detection functionality. In contrast, some work =-=[16, 18]-=- did much more thorough evaluation. They not only reported false positives and/or false negatives, but also provided the ranking of reported bugs. As the research area of software bug detection starts...

"... Tools and analyses that find bugs in software are becoming increasingly prevalent. However, even after the potential false alarms raised by such tools are dealt with, many real reported errors may go unfixed. In such cases the programmers have judged the benefit of fixing the bug to be less than the ..."

Tools and analyses that find bugs in software are becoming increasingly prevalent. However, even after the potential false alarms raised by such tools are dealt with, many real reported errors may go unfixed. In such cases the programmers have judged the benefit of fixing the bug to be less than the time cost of understanding and fixing it. The true utility of a bug-finding tool lies not in the number of bugs it finds but in the number of bugs it causes to be fixed. Analyses that find safety-policy violations typically give error reports as annotated backtraces or counterexamples. We propose that bug reports additionally contain a specially-constructed patch describing an example way in which the program could be modified to avoid the reported policy violation. Programmers viewing the analysis output can use such patches as guides, starting points, or as an additional way of understanding what went wrong. We present an algorithm for automatically constructing such patches given model-checking and policy information typically already produced by most such analyses. We are not aware of any previous automatic techniques for generating patches in response to safety policy violations. Our patches can suggest additional code not present in the original program, and can thus help to explain bugs related to missing program elements. In addition, our patches do not introduce any new violations of the given safety policy. To evaluate our method we performed a software engineering experiment, applying our algorithm to over 70 bug reports produced by two off-the-shelf bug-finding tools running on large Java programs. Bug reports also accompanied by patches were three times as likely to be addressed as standard bug reports. This work represents an early step toward developing new ways to report bugs and to make it easier for programmers to fix them. Even a minor increase in our ability to fix bugs would be a great increase for the quality of software.

"... We present a study of how Linux kernel developers respond to bug reports issued by a static analysis tool. We found that developers prefer to triage reports in younger, smaller, and more actively-maintained files (§2), first address easy-to-fix bugs and defer difficult (but possibly critical) bugs ( ..."

We present a study of how Linux kernel developers respond to bug reports issued by a static analysis tool. We found that developers prefer to triage reports in younger, smaller, and more actively-maintained files (§2), first address easy-to-fix bugs and defer difficult (but possibly critical) bugs (§3), and triage bugs in batches rather than individually (§4). Also, although automated tools cannot find many types of bugs, they can be effective at directing developers ’ attentions towards parts of the codebase that contain up to 3X more user-reported bugs (§5). Our insights into developer attitudes towards static analysis tools allow us to make suggestions for improving their usability and effectiveness. We feel that it could be effective to run static analysis tools continuously while programming and before committing code, to rank reports so that those most likely to be triaged are shown to developers first, to show the easiest reports to new developers, to perform deeper analysis on more actively-maintained code, and to use reports as indirect indicators of code quality and importance. 1

...iaged reports. This pattern shows that either all reports in a session are triaged or left un-triaged. (Kremenek et al. used a similar diagram to visualize clustering of true bugs vs. false positives =-=[10]-=-.) Table 4 quantifies the amount of clustering: The probability that all reports in a session are triaged (or untriaged) rise markedly when at least 1 or 2 reports are triaged (or un-triaged). The lar...

by
Trishul M. Chilimbi
- In International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

"... We present the design, implementation, and evaluation of HeapMD, a dynamic analysis tool that finds heap-based bugs using anomaly detection. HeapMD is based upon the observation that, in spite of the evolving nature of the heap, several of its properties remain stable. HeapMD uses this observation i ..."

We present the design, implementation, and evaluation of HeapMD, a dynamic analysis tool that finds heap-based bugs using anomaly detection. HeapMD is based upon the observation that, in spite of the evolving nature of the heap, several of its properties remain stable. HeapMD uses this observation in a novel way: periodically, during the execution of the program, it computes a suite of metrics which are sensitive to the state of the heap. These metrics track heap behavior, and the stability of the heap reflects quantitatively in the values of these metrics. The “normal ” ranges of stable metrics, obtained by running a program on multiple inputs, are then treated as indicators of correct behaviour, and are used in conjunction with an anomaly detector to find heap-based bugs. Using HeapMD, we were able to find 40 heap-based bugs, 31 of them previously unknown, in 5 large, commercial applications.

...s algorithms can use the information available in the entire trace, they can potentially reduce the “cascade-effect”, where a single mistake in the analysis leads to a large number of false positives =-=[15, 16]-=-. The third design, currently not supported by HeapMD, simultaneously uses the model constructor and anomaly detector, in an online fashion. In this approach, employed by DIDUCE [11], the model is con...

"... Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This paper reports automated support to help addres ..."

Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This paper reports automated support to help address these challenges using logistic regression models that predict the foregoing types of warnings from signals in the warnings and implicated code. Because examining many potential signaling factors in large software development settings can be expensive, we use a screening methodology to quickly discard factors with low predictive power and cost-effectively build predictive models. Our empirical evaluation indicates that these models can achieve high accuracy in predicting accurate and actionable static analysis warnings, and suggests that the models are competitive with alternative models built without screening.

...al program behavior will be. They often over-estimate possible program behaviors, leading to spurious warnings (“false positives”) that do not correspond to true defects. For example, Kremenek et al. =-=[13]-=- report that at least 30% of the warnings reported by sophisticated tools are false positives. At Google, we have observed that tools can be more accurate for certain types of warnings. Our experience...

"... Benchmarks provide an experimental basis for evaluating software engineering processes or techniques in an objective and repeatable manner. We present the FAULTBENCH benchmark, as a contribution to current benchmark materials, for evaluation and comparison of techniques that prioritize and classify ..."

Benchmarks provide an experimental basis for evaluating software engineering processes or techniques in an objective and repeatable manner. We present the FAULTBENCH benchmark, as a contribution to current benchmark materials, for evaluation and comparison of techniques that prioritize and classify alerts generated by static analysis tools. Alert prioritization and classification addresses the problem in many static analysis tools of numerous alerts that are not an indication of a fault or unimportant to the developer. We utilized FAULTBENCH to evaluate three versions of the AWARE adaptive ranking model to prioritize and classify static analysis alerts. Individual FAULTBENCH subjects have different best prioritization and classification techniques. Using a single subject to evaluate a prioritization and classification technique could provide incorrect results. Together, FAULTBENCH subjects provide a precise and general evaluation of alert prioritization and classification techniques.

...oritization and classificationstechniques. The literature in the realm of static analysis alertsprioritization and classification is moving towards a definition forsconducting and evaluating research =-=[9, 10, 12, 14, 20, 22]-=-.sFAULTBENCH provides a basis for comparison of static analysissalert prioritization and classification techniques and contributesssubject programs; an analysis procedure; and evaluation metrics.sThe ...

"... To improve the reporting of results from model checking and programanalysis systems, we introduce the notion of an error projection and annotated error projection. An error projection is a set of program nodes N such that for each node n ∈ N there exists an (abstract) error path from the program ent ..."

To improve the reporting of results from model checking and programanalysis systems, we introduce the notion of an error projection and annotated error projection. An error projection is a set of program nodes N such that for each node n ∈ N there exists an (abstract) error path from the program entry s through n to a specified target node t. An annotated error projection associates with each node n in the error projection an (abstract) counterexample that validates the error along with an abstract store, whose presence at n induces the error. We present novel algorithms for computing (annotated) error projections and discuss additional applications for these algorithms. Our experiments show that error projections can be computed efficiently.

...t localization. It would be interesting to explore further use of error projections for fault localization. Such error-reporting techniques have also been used outside model checking. Kremenek et al. =-=[14]-=- use statistical analysis to rank counterexamples found by the xgcc[9] compiler. Their goal is to present to the user an ordered list of counterexamples sorted by their confidence rank. The goal of bo...

by
Sarah Heckman, Laurie Williams
- In International Conference on Software Testing, 2009

"... Automated static analysis can identify potential source code anomalies early in the software process that could lead to field failures. However, only a small portion of static analysis alerts may be important to the developer (actionable). The remainder are false positives (unactionable). We propose ..."

Automated static analysis can identify potential source code anomalies early in the software process that could lead to field failures. However, only a small portion of static analysis alerts may be important to the developer (actionable). The remainder are false positives (unactionable). We propose a process for building false positive mitigation models to classify static analysis alerts as actionable or unactionable using machine learning techniques. For two open source projects, we identify sets of alert characteristics predictive of actionable and unactionable alerts out of 51 candidate characteristics. From these selected characteristics, we evaluate 15 machine learning algorithms, which build models to classify alerts. We were able to obtain 88-97 % average accuracy for both projects in classifying alerts using three to 14 alert characteristics. Additionally, the set of selected alert characteristics and best models differed between the two projects, suggesting that false positive mitigation models should be project-specific.

...s investigatesthe k nearest neighbors and weigh the contribution ofseach neighbor by a distance measure to classify alertss[20].sBayesian networks are a probabilistic model ofsthe selected attributes =-=[12, 20]-=-.sEach machine learnerswas run with default options in Weka [20] unlesssotherwise stated.s6. Research resultssWe hypothesize that the important ACs and machineslearners will vary by project. The selec...

"... The longer a fault remains in the code from the time it was injected, the more time it will take to fix the fault. Increasingly, automated fault detection (AFD) tools are providing developers with prompt feedback on recently-introduced faults to reduce fault fix time. If, however, the frequency and ..."

The longer a fault remains in the code from the time it was injected, the more time it will take to fix the fault. Increasingly, automated fault detection (AFD) tools are providing developers with prompt feedback on recently-introduced faults to reduce fault fix time. If, however, the frequency and content of this feedback does not match the developer’s goals and/or workflow, the developer may ignore the information. We conducted a controlled study with 18 developers to explore what factors are used by developers to decide whether or not to address a fault when notified of the error. The findings of our study lead to several conjectures about the design of AFD tools to effectively notify developers of faults in the coding phase. The AFD tools should present fault information that is relevant to the primary programming task with accurate and precise descriptions. The fault severity and the specific timing of fault notification should be customizable. Finally, the AFD tool must be accurate and reliable to build trust with the developer. 1.

...when the developer is notsfamiliar with the code.sHowever, accuratelysidentifying faults can be problematic for tools thatsemploy static analysis, which is known to generateshigh false positive rates =-=[20]-=-.sIn AWARE, eachsdetected fault is provided with a probability that thesfault is a true positive.sConcurrent research onsAWARE is investigating techniques to improve thesaccuracy of the true positive ...