INPUT SYSTEM MODEL ANALYSIS OUTPUT

Transcription

1 Detecting Null Pointer Violations in Java Programs Xiaoping Jia, Sushant Sawant Jiangyu Zhou, Sotiris Skevoulis Division of Software Engineering School of Computer Science, Telecommunication, and Information Systems DePaul University Chicago, Illinois, U.S.A. Abstract The use of formal s has been growing steadily and therehavebeen a number of successful applications of formal s in a range of application areas. It seems agreed that quality should be assured by applying testing, analysis and formal s to rigorously dened precode artifacts. The detection of null pointer violation errors is denitely such a goal. This way of applying formal s has a great potential to increase our condence in the software. Our goal is to provide a practical mechanism to assist the application of formal s in the early detection of null pointer violation errors in programs. Our solution is theorem proving based and is focused on the identication of the possible places in which a theorem prover could assist in the detection of null pointer violation errors and the formulation of the necessary proof obligations. 1. Introduction The strong type system that most programming languages use cannot prevent tedious bugs from occurring. Such bugs include dereferencing a null pointer or an index falling out of bounds etc. These kinds of problems are beyond the capabilities of compilers and subsequently are transferred to the run-time environment where some languages oer mechanisms to deal with these problems (exception handling mechanisms). In general, the term static analysis is used to describe a spectrum of tasks that vary from enabling compilers to generate better code [8, 10] to the discovery of code anomalies or deciencies [9] and the reduction of inconsistencies within the program. Over the past twenty years there have been approaches [1] that use assertions that havetobetrueat some point orpoints during program execution. Such approaches can be found in LCLint [2] that uses annotations to perform error checking. Anna [7] uses mathematical description of certain program properties inserted in ADA code as formal comments. Ongoing research work at DEC laboratories has resulted in Extended Static Checking (ESC) [4, 5] and checking object invariants [6]. They attempt to use formal s to identify particular kinds of bugs in a programming language and provide also some kind of feedback to the programmer about those potential bugs. In this paper we present a theorem proving based static program analysis technique that is capable of detecting illegal dereferences in Java, Our technique is completely automatic and does not require the programmer to provide specications, although it could be helpful. The theorem prover helps in evaluating the implications of conditions dened in the program. Our approach is based on the concept of weakest precondition as dened by Gries [3]. The technique attempts to formulate certain global program properties from the class under examination and evaluates them with a theorem prover. These properties are classied as class invariants. The class invariants are combined with the precondition for a variable under examination to evaluate the post condition. This way of formulating obligations inferred from the static properties of the program under examination and evaluating against a theorem prover shows the benets of mathematically verifying the expected postcondition against the prevailing preconditions for each program property under examination. The theorem proverpartially assists in checking the conditions used to control the logical ow of a program by evaluating the proof

2 public class MyClass1 { protected String s1, s2 // constructor 1 public void 2() { public MyClass1(..) { String s3 s1 = new String(...) if (cond1) { } s3 = new String(...) // constructor 2 //... public MyClass1(...) { if (cond2) { s1 = new String(...) s2 = new String(... ) } } } public void 1() { }... s2.length()... }...s3.length... Figure 1: A null pointer example obligations formed from the path under examination. A research prototype is under development and the preliminary results are encouraging and demonstrate the feasibility and eectiveness of our approach. The rest of the paper is organized as follows: Section two briey discusses the errors in Java that can be detected by our technique and the foundation of our analysis. It also presents the details of our technique applied in detecting null pointer violation in Java programs. Section three discusses the eectiveness of this technique. We conclude in section four, and discuss our future work in section ve. 2. Null Pointer Detection Null Pointer Detection is fundamental to the problem of program analysis. Some of the problems related to detecting null pointer errors are aliasing, name collision, unexecutable paths, uninitialized pointers, Any tool involving automatic program analysis is likely to benet from information about null pointers in a program. Null pointer information is essential to knowing the state of the pointer at an arbitrary point inaprogram for conducting other analyses (array index out of bounds, class cast exception). Consider the example inf Fig. 1. In MyClass1, two instance variables, s1 and s2, are declared. However the programmer forgets to initialize s2 in constructor 1, which is a very common mistake - forgetfulness. The consequence is that the dereferencing of s2 in 1 may cause a null pointer exception at run-time. In 2, the local variable s3 is initialized inside an if statement, and is dereferenced inside a dierent if statement. The dereference may also cause a null pointer exception at run-time if cond2 is not implied by cond1. These types of errors are very common in programs Foundation of the Analysis The static analysis is based on the concept of class invariant. A class invariant is a condition satised by all non-transient states of the instances of the class. The analysis is based on the following two requirements of class invariants. Assume that CI is an invariant of class C, then I-1) All public constructors must ensure the invariant, i.e., for public constructor cntr of C ftrueg cntr fci g I-2) All public s must preserve the invariant, i.e., for each public mthd of C fci g mthd fci g The implication is that for an instance created by any public constructor of the class you may invoke the public s in any order (assuming the s are thread safe). Informally, the main steps of detecting errors regarding certain property p are the following: Use I-1 to establish a class invariant CIp involving p, preferably as strong as possible. Use I-2 to verify that CIp is indeed an invariant, weaken the invariant if necessary. Use CIp as the pre-condition of s to proof certain assertions. We insert assertions regarding p at certain places and try to proof it. If the proof fails, a potential violation is detected. The detection of null-pointer violations using this technique involves the property isnull(v), which asserts variable v is not null. The analysis consists of the following steps: 1. Establish a class invariant regarding the nullity of reference variables. For each reference variable v, check: a) if isnull(v) is a post condition of all public constructors b) if isnull(v) is preserved by all public s. If so, add isnull(v) totheclassinvariant.

3 2. Analyze all public s using the invariant as the pre-condition of public s. For each dereference of reference variable v, attempt to prove :isnull(v) Our Approach Intra class analysis is restricted to information obtained within the class, any intra analysis is restricted to information obtained within the. Inter class analysis is restricted to information obtained from all the classes participating in the code to be analyzed, inter analysis is restricted to information obtained from all the s participating in the class to be analyzed. In this paper we focus on the intra- and intra-class analysis Determining Class Invariant To establish a class invariant, we analyze all the constructors of the class. If the reference variable v dened in the class is found to be always initialized through invocation of these constructors, or initialized during static initialization we could consider :isnull(v) a potential candidate to be part of class invariants. Assume CI is a set of variables, which can be represented as :isnull(v),ci candidate stores a set of candidate variables of CI, which needtobeveried. Condition represents the conditions attached to each path. There are three steps to detect a class invariant: Step 1 : Collect all the not-null static elds into CI. Since the static elds dened in the class will just hold one copy of object through the whole class, they are denitely members of CI. Step 2 : Check all class's constructors to detect CI candidates. If all constructors initialize the eld, we can say it is a candidate of CI. Step 3 : Check all s to verify the candidates. If no changes the candidate eld to null, then the eld can be a member of CI. Let us informally dene the following: SI candidate is static class invariant candidate. SI is static class invariant, NSI candidate is non static class invariant candidate. NSI is a non static class invariant CI is class invariant In order etermine static invariants we need to: Check all static elds for not null condition in master static paths, get SI candidate Check all non static blocks, constructors and s ( static and non static s) to verify SI candidate. If no changes the candidate eld to null then the eld can be a member of SI. fsi candidateg non-static-block fsi candidateg fsi candidateg constructor fsi candidateg fsi candidateg fsi candidateg The determination of non-static invariants has as follows: Use SI as precondition for checking all constructors to determine NSI candidates. If all constructors initialize the eld, we can establish it is a candidate of NSI. fsi g constructor fnsi candidateg Check all s to verify NSI candidates. If no changes the candidate eld to null then the eld can be a member of NSI. fnsi candidateg fnsi candidateg After we get the class invariants, we can use them to detect potential null pointer violations. We need to insert assertions f:isnull(v)g at certain places wherever a dereference of reference variable v is attempted. We attempt to prove these assertions hold with class invariants as the pre-condition of the s. If the proof fails, a potential violation is detected Violation Detection Program operations that may signal run-time errors during an execution are a starting point in the static debugging process. By inspecting the invariants for the arguments of each program operation we can identify those program operations that can cause run-time errors and ag them for inspection by the programmer. The process for detecting a null pointer violation can be summarized as below: Let: WC istheweakest pre condition Visanyvariable that is dereferenced Check if: fwc g):isnull(v)

4 is not proved, there is a potential null pointer violation. Check all master constructor paths if fsi g^fwc g):isnull(v) is not proved, there is a potential null pointer violation. Check all non-static s, if fsi g^fci g^fwc g):isnull(v) is not proved, there is a potential null pointer violation. Appropriate proof obligations are created for each statement inagiven path where the eld to be proved is dereferenced. 3. The Prototype Tool The model tree component represents a java class using a logical model. It accepts the output of the java parser and uses it to create an abstract model of the physical structure of the program. The modeling activity depends on an extensive set of classes that provide support for every java programming construct. Name Analyzer gives unique name to each variable declared in the program under examination to avoid name clash resolution. The model tree is used to create a control ow graph, which creates paths mapping the serial execution of source code. Other pertinent information that is required for analyzing the program and pin pointing those program operations that may cause run-time errors are also attached tothiscontrol ow graph. The number of possible paths is a factor of the number of condition statements and their relationships in the program under examination. Inorder to establish all the possible paths for a given program we need to identify all the conditional constructs in the given program that result in dening more paths of execution. Compound conditions need to be broken down into simple conditions, thus dening more paths of execution and helping us pin point the exact sequence of conditions that resulted in a given path. The path generator creates all the possible paths for each control ow graph and serializes them in a le associated to each control ow graph. After identifying all the paths, the analysis will be performed on all the possible paths. Only when the result is correct for all the possible paths, will it be determined to be correct. An overview of our prototypetoolisshown in Fig. 2 SYSTEM MODEL Model Tree ANALYSIS INPUT Determine Null Pointer Invariant OUTPUT Java Source Code Name Analysis Control Flow Graph Initialization Sequence Analyzer Result Presentation (text, GUI ) Paths Check Null Pointer Violation Figure 2: Prototype Components 3.1. Program Model Analyzer Theorem Prover Once the program model is ready, the common properties of the class can be determined by establishing the class invariants. The information obtained from class invariants could be further used to detect potential null pointer violations. A class in the Java Programming language may contain zero or more static initialization block denitions. The denition of static blocks does play a signicant role in the initialization process of the object being instantiated. The determine invariant process does need to know the exact initialization sequence of the object being created to analyze the post-condition of each member variable dened in the class. Static initialization blocks may be static or non static depending on the use of the keyword static. Static and non-static blocks can have branch constructs. In order to track the initialization process of an object, it is important to know the paths through which an object could get initialized. Static blocks can have only static variable declarations. Non static blocks can have both static and non static variable declarations. Static variables get initialized on the rst usage of the class. Non static variables get initialized only when an instance of the class is created. The net eect of all the static blocks should be analyzed followed by the net eect of all non static blocks for each individual constructor. This demands the need for identifying all the possible paths as a result of combining all the static blocks. Further there is a need to identify all

5 the possible paths as a result of combining all the non static paths for each constructor dened for an arbitrary class. This is implemented by inferring master static paths for each class and a master constructor paths for each constructor Theorem Prover For a variable of some reference types we formthe predicate :isnull(var). This predicate gets modied by replacing the values assigned to the variables as we track each program path bottom up, any conditions are added as preconditions to this predicate. The theorem prover returns a true result if the obligation can be proved to be true which also indicates that the variable may not result in a potential null pointer exception. On the contrary a false result returned bythe theorem prover may indicate a potential null pointer exception or the inability of the theorem prover to prove that a potential null pointer exception may not occur. 4. Experiments A prototype has been developed to evaluate the effectiveness of this technique. We evaluated the effectiveness by giving several dierent program inputs. The program inputs given considered only the nondistributive nature of the Java programming language. The potential null pointer violations were checked for every variable dereferenced in the given class. Prior to using the prototype, we veried that the program compiles correctly. We investigate for a variety of dierent scenarios that may cause a null pointer violation. We have indentied the following possible scenarios and we assigned to each one of them a unique name, which will facilitate the presentation of the coverage of our experiment. The categorization is shown in Fig. 3 where var denotes a Java variable of reference type. Cause Id C1 C2 C3 C4 C5 C6 C7 Description class level var not initialized class level var becomes null at arbitrary var is dereferenced under two condition blocks null var dereferenced in an unexecutable path var initialized but randomnly becomes null var becomes null before end of iteration of a loop var not initialized but dereferenced is guarded Figure 3: Categorization of Causes The cases were classied by the type of variable Type Scope Location Cause init block local static block constructor static instance class local class init block static block constructor init block static block constructor init block static block constructor Figure 4: Potential Error Classication (static, non-static), scope of the variable (local, class), location of error (static block, non static block, constructor, ). A summary of our classication is shown in Figure 4. The same cases were extended further to check potential null pointer violations in inner classes. Test cases for a predened condence level were generated, using templates and applied to each mutated version. The generation of new mutated versions of the program was repeated, with dierent faults under dierent context's until enough samples were collected to evaluate the prototype performance in a meaningful way. Initial tests conducted were to check for null pointer violations in programs where variables were uninitialized. The program representation takes into consideration the evaluation of initial values of variables as per the Java language specication and is able to identify null pointer violations arising due to uninitialized variables. The invariants help us detect null pointer errors for variables which become null arbitrarily. The prototype is able to detect null pointer violations in statements as well as conditions.

6 The program representation evaluates more number of paths for compound conditions by splitting it into simple conditions, loops are evaluated as paths with none or more iterations. This simplication helps us identify the exact path and the sequence of statements that will result in a potential null pointer exception. The collection of obligations evaluates any aliases arising during analysis and makes appropriate substitutions in the proof obligations submitted to the theorem prover. In a nutshell it was found that in many cases it was cost eective to use formal program proof techniques as compared to systematic testing. In most of the cases the prototype was successful in identifying invariants and detecting null pointer violation errors. The prototype is free from the side-eects of aliasing. The prototype is also free from pointing errors in unexecutable paths and hence provides a partial solution to the problem of unexecutable paths. 5. Conclusion The aim of the work presented here is to foster the use of formal s in specic stages of software development process. We have attempted to make acase for the potential benets of the use of formal s without most of its unacceptable costs. The idea of integration of an automatic theorem prover is a major step in this process. Our technique does not require programmers to provide specications and it is fully automatic. It is capable of detecting null-pointer violations which could result in major cost paying in testing and debugging. The prototype is able to identify a majority ofnull pointer violation errors based on the information available in an intra-class, intra- domain. The prototype does not have restrictions on simple control and data ow facilities, it also denes more paths of execution by simplifying branch conditions. This helps the developer segment the paths at anerlevel. The tool also addresses issues related to aliasing and is free from pointing errors in those paths that are unexecutable and thus provides a partial solution to the problem of unexecutable paths. The presented approach identies the program operations that may signal errors during an execution and describes the sets of erroneous argument values that may cause those errors. The combination of control-ow graph with the theorem prover based approach gives us a mathematical basis to rationalize about correctness. This analysis could be extended to inter- and inter-class based analysis. The approach could be applied to solve problems related to illegal downcasts and array bounds checks. In the future, we plan to extend the technique to inter and inter class analysis and we will handle inheritance and overriding of, recursive invocation and cyclic class dependency. We will also apply our technique to detect illegal downcast, array index out of bound. References [1] E. Dijkstra. Guarded commands, nondeterminacy and formal derivation of program. Communications of ACM, 18(8):453{458, [2] D. Evans. Using Specications to Check Source Code, MS Thesis, Massachusetts Institute of Technology. [3] D. Gries. The science of Programming. Springer- Verlag, [4] D. l. Detlefs. An overview of the extended static checking. In Proc. The First Workshop on Formal Methods insoftware Practice, pages 1{9, ACM-SIGSOFT. [5] K. R. M. Leino. Ecstatic: An object-oriented programming language with axiomatic semantics. In Proc. FOOL4, 1997.Fourth International Workshop on Foundations of Object-Oriented Languages. [6] K. R. M. Leino and R. Stata. Checking object invariants. Technical report, Digital Equipment Corporation Research Center, Palo Alto, CA. [7] D. Luckham. Programming with Specications An Introduction to Anna, a Language for Specifying Ada Programs. Springer-Verlag, [8] F. Nielson. Semantics-directed program analysis. a toolmaker perspective. In Proc. SAS96, pages 2{21. Springer-Verlag, LNCS Vol [9] A. Rosskopf. Use of a static analysis tool for safety-critical applications. In Proc. ADA- EUROPE96, pages 182{197. Springer-Verlag, LNCS Vol [10] B. Steen. Property-oriented expansion. In Proc. SAS96. Springer-Verlag, LNCS Vol

Wagner Spring 2014 CS 161 Computer Security 1/27 Reasoning About Code Often functions make certain assumptions about their arguments, and it is the caller s responsibility to make sure those assumptions

6. Hoare Logic and Weakest Preconditions Program Verification ETH Zurich, Spring Semester 07 Alexander J. Summers 30 Program Correctness There are many notions of correctness properties for a given program

ICC++ Language Denition Andrew A. Chien and Uday S. Reddy 1 May 25, 1995 Preface ICC++ is a new dialect of C++ designed to support the writing of both sequential and parallel programs. Because of the signicant

A Short Summary of Javali October 15, 2015 1 Introduction Javali is a simple language based on ideas found in languages like C++ or Java. Its purpose is to serve as the source language for a simple compiler

On a New Method for Dataow Analysis of Java Virtual Machine Subroutines Masami Hagiya Department of Information Science, Graduate School of Science, University of Tokyo hagiyais.s.u-tokyo.ac.jp Abstract

{ PEP { More than a Petri Net Tool ABSTRACT Bernd Grahlmann and Eike Best The PEP system (Programming Environment based on Petri Nets) supports the most important tasks of a good net tool, including HL

#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

Concepts of Object-Oriented Programming AS 2015 Exercise 10 Object Structures and Aliasing November 27, 2015 Task 1 [From a previous exam] In answering this task, do not use reflection, inheritance, and

CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

Reasoning About Loops Using Vampire in KeY Wolfgang Ahrendt, Laura Kovács, and Simon Robillard Chalmers University of Technology Abstract. We describe symbol elimination and consequence nding in the rst-order

Why testing and analysis Software Testing Adapted from FSE 98 Tutorial by Michal Young and Mauro Pezze Software is never correct no matter what developing testing technique is used All software must be

Validation of Stack Eects in Java Bytecode Jaanus Poial Institute of Computer Science University of Tartu, Estonia e-mail: jaanus@cs.ut.ee February 21, 1997 Abstract The Java language is widely used in

Draft Debugging of Optimized Code through Comparison Checking Clara Jaramillo, Rajiv Gupta and Mary Lou Soa Abstract We present a new approach to the debugging of optimized code through comparison checking.

Contract Programming For C++0x WG21/N1800 and J16/05-0060 Lawrence Crowl and Thorsten Ottosen lawrence.crowl@sun.com and nesotto@cs.aau.dk 2005-04-27 Overview This is an annotated version of the presentation

Syntax - the form or structure of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units Who must use language definitions? 1. Other language

Operational Semantics #1 One-Slide Summary Operational semantics are a precise way of specifying how to evaluate a program. A formal semantics tells you what each expression means. Meaning depends on context:

Chapter 1: Key Concepts of Programming and Software Engineering Software Engineering Coding without a solution design increases debugging time - known fact! A team of programmers for a large software development

Identify and overcome the difficulties encountered by students when learning how to program List and explain the software development roles played by students List and explain the phases of the tight spiral

Abstract Interpretation Ranjit Jhala, UC San Diego April 22, 2013 Fundamental Challenge of Program Analysis How to infer (loop) invariants? Fundamental Challenge of Program Analysis Key issue for any analysis

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,