Σχόλια 0

Το κείμενο του εγγράφου

Alias Analysis for Object-Oriented ProgramsManu Sridharan1,Satish Chandra1,Julian Dolby1,Stephen J.Fink1,and EranYahav21IBM T.J.Watson Research Center2Technionfmsridhar,satishchandra,dolby,sjfinkg@us.ibm.com,yahave@cs.technion.ac.ilAbstract.We present a high-level survey of state-of-the-art alias analy-ses for object-oriented programs,based on a years-long eort developingindustrial-strength static analyses for Java.We rst present commonvariants of points-to analysis,including a discussion of key implemen-tation techniques.We then describe ow-sensitive techniques based ontracking of access paths,which can yield greater precision for certainclients.We also discuss how whole-program alias analysis has becomeless useful for modern Java programs,due to increasing use of re ectionin libraries and frameworks.We have found that for real-world programs,an under-approximate alias analysis based on access-path tracking oftenprovides the best results for a variety of practical clients.1 IntroductionEective analysis of pointer aliasing plays an essential role in nearly all non-trivial program analyses for object-oriented programs.For example,computinga precise inter-procedural control- ow graph,a necessity for many programanal-yses,often requires signicant pointer reasoning to resolve virtual dispatch.Fur-thermore,any program analysis attempting to discover non-trivial properties ofan object must reason about mutations to that object through pointer aliases.Building alias analyses that simultaneously scale to realistic object-orientedprograms and libraries while providing sucient precision has been a longstand-ing challenge for the program analysis community.The twin goals of scalabilityand precision often con ict with each other,leading to subtle tradeos thatmake choosing the right alias analysis for a task non-obvious.Moreover,as largeobject-oriented frameworks (e.g.,Eclipse3for desktop applications or Spring4for server-side code) have proliferated,achieving scalability and precision hasbecome increasingly dicult.In this work,we give a high-level survey of the alias-analysis techniquesthat we have found most useful during a years-long eort developing industrial-strength analyses for Java programs.We focus on two main techniques:3http://www.eclipse.org4http://www.springsource.org1.Points-to analysis,specically variants of Andersen's analysis [3] for Java.A points-to analysis result can be used to determine may-alias information,i.e.,whether it is possible for two pointers to be aliased during programexecution.2.Flow-sensitive tracking of the access paths that name an object,where anaccess path is a variable and a (possibly empty) sequence of eld names(see Section 5 for details).Access-path tracking enables determination ofmust-alias information,i.e.,whether two pointers must be aliased at someprogram point.We also aim to explain particular challenges we have encountered in buildinganalyses that scale to modern Java programs.We have found that as standardlibraries and frameworks have grown,diculties in handling re ection have ledus to reduce or eliminate our reliance on traditional points-to analysis.Instead,we have developed an under-approximate approach to alias analysis based onon type-based call graph construction and tracking of access paths.We havefound this approach to be more eective for analyzing large Java programs,though traditional points-to analysis remains relevant in other scenarios.Ourexperiences may shed light on issues in designing analyses for other languages,and in designing future languages to be more analyzable.This chapter is not intended to be an exhaustive survey of alias analysis.Overthe past few decades,computer scientists have published hundreds of paperson alias-analysis techniques.The techniques vary widely depending on myriadanalysis details,such as policies for ow sensitivity,context sensitivity,demand-driven computation,and optimization tradeos.We cannot hope to adequatelycover this vast space,and the literature grows each year.Here we focus on aliasanalyses that we have signicant experience implementing and applying to realprograms.To the best of our knowledge,the presented alias analyses are thestate-of-the-art for our desired analysis clients and target programs.For manyof the analyses described here,a corresponding implementation is available aspart of the open-source Watson Libraries for Analysis (WALA) [69].Organization This chapter is organized as follows.In Section 2,we motivate thealias-analysis problemby showing the importance of precise aliasing informationfor analysis clients.Then,we discuss points-to analysis for Java-like languages:Section 3 gives formulations of several variants of Andersen's analysis [3],andSection 4 discusses key implementation techniques.Section 5 discusses must-aliasanalysis based on access-path tracking,which provides greater precision than atypical points-to analysis.In Section 6,we describe challenges in applying points-to analysis to modern Java programs and how under-approximate techniques canbe used instead.Finally,Section 7 concludes and suggests directions for futurework.Some of the material presented here has appeared in previous work by theauthors [20,62,67].2 Motivating AnalysesMany program analyses for object-oriented languages rely on an eective aliasanalysis.Here we illustrate a number of alias analysis concerns in the context ofan analysis for detecting resource leaks in Java programs,and discuss how theseconcerns also pertain to other analyses.2.1 Resource Leaks1 public void test(File file,String enc) throws IOException {2 PrintWriter out = null;3 try {4 try {5 out = new PrintWriter(6 new OutputStreamWriter(7 new FileOutputStream(file),enc));8 } catch (UnsupportedEncodingException ue) {9 out = new PrintWriter(new FileWriter(file));10 }11 out.append('c');12 } catch (IOException e) {13 } finally {14 if (out!= null) {15 out.close();16 }17 }18 }Fig.1.Example of a resource leak.While garbage collection frees the programmer from the responsibility ofmemory management,it does not help with the management of nite systemresources,such as sockets or database connections.When a program written ina Java-like language acquires an instance of a nite system resource,it mustrelease that instance by explicitly calling a dispose or close method.Letting thelast handle to an unreleased resource go out of scope leaks the resource.Leaks cangradually deplete the nite supply of system resources,leading to performancedegradation and system crashes.Ensuring that resources are always released,however,is tricky and error-prone.As an example,consider the Java program in Fig.1,adapted from codein Apache Ant.5The allocation of a FileOutputStream on line 7 acquiresa stream,which is a system resource that needs to be released by call-ing close() on the stream handle.The acquired stream object then passes5http://ant.apache.orginto the constructor of OutputStreamWriter,which remembers it in a pri-vate eld.The OutputStreamWriter object,in turn,passes into the construc-tor of PrintWriter.In the finally block,the programmer calls close()on the PrintWriter object.This close() method calls close() on the\nested"OutputStreamWriter object,which in turn calls close() on the nestedFileOutputStream object.By using finally,it would appear that the programcloses the stream,even in the event of an exception.However,a potential resource leak lurks in this code.The constructor ofOutputStreamWriter might throw an exception:notice that the programmeranticipates the possibility that an UnsupportedEncodingException may occur.If it does,the assignment to the variable out on line 5 will not execute,andconsequently the stream allocated on line 7 is never closed.A resource leakanalysis aims to statically detect leaks like this one.2.2 Role of alias analysisA resource leak analysis should report a potential leak of the streamallocated atline 7 of Fig.1.But a bug nding client should not trivially report all resourceacquisitions as potentially leaking,as that would generate too many false pos-itives.Hence,the key challenge of resource leak analysis is in reasoning that aresource in fact does not leak,and it is this reasoning that requires eective aliasanalysis.Here we will consider what it takes to prove that the resource allocatedat line 7 does not leak along the exception-free path.Figure 2 shows the relevant parts of the CFG for Fig.1,along with the CFGsof some of the called methods.We introduced temporary variables t1 and t2when constructing the CFG (a).Consider the program path 7-6-5-11-14-15 (the exception-free path),startingwith the resource allocation on line 7.The constructor on line 6 stores its argu-ment into an instance eld a;see CFG (b).Likewise,the constructor on line 5stores its argument into an instance eld b;see CFG (c).The call out.close()on line 15 transitively calls close() on expressions out.b and out.b.a (noticethat this in CFGs (d) and (e) would be bound appropriately),the last onereleasing the tracked resource as it is equal to t1.At this point,the (same)resource referred to by the expressions t1,t2.a,and out.b.a is released.What reasoning is needed for an analysis to prove that the resource allocatedon line 7 is denitely released along the path 7-6-5-11-14-15?1.Data ow must be tracked inter-procedurally,as the call to the close()method of the FileOutputStream occurs in a callee.For this reason,anaccurate call graph must be built.2.The analysis must establish that this.a in CFG(e),when encountered alongthis path must refer to the same object assigned to t1 in CFG (a).Both cases demand eective alias analysis.out = nullt1 = new FileOutputStream(file)t2 = new OutputStreamWriter(t1, enc)out = new PrintWriter(t2)catch UnsupportedEncodingExceptiont3 = new FileWriter(file)out = new PrintWriter(t3)out.append()out != nullout.close()exitentry276511141598UEE(c) CFG of PrintWriter.<init>(w)this.b = wexitentry(b) CFG of OutputStreamWriter.<init>(os)this.a = osexitentry(e) CFG of OutputStreamWriter.close()this.a.close()exitentry(d) CFG of PrintWriter.close()this.b.close()exitentry(a) CFG of testFig.2.Control- ow graph of the procedure shown in Fig.1.Numbers to the left areline numbers.Dotted edges represent inter-procedural control transfers.Call Graph Construction A call graph indicates the possibly-invoked meth-ods at each call site in a program.In object-oriented languages,virtual callsmake building a call graph non-trivial.Consider CFG (d).Say the eld b inclass PrintWriter has declared type Writer,which has a number of dier-ent subtypes,one of which is OutputStreamWriter.A call graph based solelyon the program's class hierarchy (a class hierarchy analysis [13]) allows imple-mentations of close() in all possible Writer subtypes to be potential targetsof the call this.b.close().In principle,some subtype of Writer other thanOutputStreamWriter could implement close() in a way that does not close theresource,causing a false positive here.6A points-to analysis (see Section 3) creates an over-approximation of allthe heap values that can possibly ow into each reference by tracing data owthrough assignments.In this program,the only heap values that ow into eld6Rapid type analysis (RTA) [4],which only considers allocated types,is generallymore accurate than class hierarchy analysis [66],but it would still cause a falsepositive if the bad Writer subtype were allocated anywhere in the program.Wehave found that the dierence between RTA and class hierarchy analysis tends tovanish in large framework-dependent programs.b of PrintWriter are of type OutputStreamWriter (via the call at line 5) orFileWriter (via line 9).The class FileWriter inherits its close() method fromOutputStreamWriter.Thus,a call graph based on points-to analysis can cor-rectly narrow down the call target of this.b.close() to the close() methodin the class OutputStreamWriter,as shown in edges from CFG (d) to CFG (e),yielding greater precision in the resource leak analysis.Note that call graph construction and alias analysis are often inter-dependent.In our example,the assignment to the b eld of PrintWriter occurs in a callee(the constructor <init>) via the calls at lines 5 and 9.Hence,the points-toanalysis must know that this <init> method is in the call graph to properlytrace the ow of an OutputStreamWriter into the b eld,which in turn impliesthat OutputStreamWriter.close() can be called from in CFG (d).Multipleapproaches exist to address this inter-dependency,to be discussed in Section 3.Equality Recall that to prove leak freedom for the path of interest,the analysisneeds to show that the object referenced by this.a in CFG (e) refers to thesame object pointed by t1 is CFG (a),to ensure that the resource is released.However,this fact cannot be proved using points-to analysis alone,as it onlyprovides may-alias information,i.e.,it can only state that this.a may referto the same object as t1.Given may-alias information alone,the analysis mustconsider a case where this.a does not alias t1 on the path,and a false leak willbe reported.To avoid this false report,must-alias information is needed,indicating aliasrelationships that must hold at a program point.This reasoning can be ac-complished by tracking must access paths naming the resource as part of theresource-leak analysis.Much as in the informal reasoning described previously,must access paths are expressions of the form t1,t2.a,and out.b.a,with theproperty that in the current programstate,they must equal the tracked resource.By tracking access paths along the control- ow path of interest,the equality ofthis.a and t1 in our example can be established,and the false leak report isavoided.As we shall show in Section 5,access-path tracking can provide usefulaliasing information for a number of important client analyses.2.3 Other Analysis ClientsMany client analyses share some or all of the alias analysis needs shown for theresource leak analysis above.Any static analysis performing signicant reasoningacross procedure boundaries (quite a large set) is likely to benet from a precisecall graph produced via alias analysis,due to the pervasiveness of method callsand virtual dispatch in Java-like languages.Other analyses can be rather directlyformulated in terms of possible heap data ow and aliasing,for example,staticrace detection [43] and taint analysis [68].Access-path tracking is most oftenused for analyses that need to track changing properties of objects,like resourceleak analysis or typestate verication [12,20],but other analyses may also benetfrom the additional precision.3 Formulating Points-To AnalysisHere we formulate several common variants of Andersen's points-to analysis [3]for Java-like object-oriented languages;implementation techniques will be dis-cussed in Section 4.We begin with a standard formulation of context-insensitiveAndersen's analysis that captures its essential points.Then,we extend the for-mulation with a generic template for context sensitivity,and we present variouscontext-sensitive analyses in the literature as instantiations of the template.3.1 Context-Insensitive FormulationA points-to analysis computes an over-approximation of the heap locations thateach program pointer may point to.Pointers include program variables and alsopointers within heap-allocated objects,e.g.,instance elds.The result of theanalysis is a points-to relation pt,with pt(p) representing the points-to set ofa pointer p.For decidability and scalability,points-to analyses must employabstraction to nitize the possibly-innite set of pointers and heap locationsarising at runtime.In particular,a heap abstraction represents dynamic heaplocations with a nite set of abstract locations.Andersen's points-to analysis [3] has the following properties:{ Flow insensitive:The analysis assumes statements can execute in any orderand any number of times.{ Subset based:The analysis models directionality of assignments,i.e.,a state-ment x = y implies pt(y)  pt(x).In contrast,an equality-based analysis(e.g.,that of Steensgaard [65]) would require pt(y) = pt(x) for the samestatement,a coarser approximation.As is typical for Java points-to analyses,we also desire eld sensitivity,whichrequires separate reasoning about each instance eld of each abstract location.Field-based analyses for Java,in which instance eld values are merged acrossabstract locations,may provide sucient precision for certain clients [32,64].However,eld sensitivity typically adds little expense to a context-insensitiveanalysis [32],and for context-sensitive analyses (to be discussed in Section 3.2),eld sensitivity is essential for precision.Table 1 gives a standard formulation of context-insensitive,eld-sensitiveAndersen's analysis for Java,equivalent to those appearing elsewhere in theliterature [32,52,64,72].7Canonical statements for the analysis are given in therst column.In order,the four statement types enable object allocation,copyingpointers,and reading and writing instance elds.More complex memory-accessstatements (e.g.,x.f = y.g.h) are handled through suitable introduction of7Points-to analysis has also been formulated as an abstract interpretation [11] (e.g.,by Might et al.[41]),yielding a systematic characterization of the analysis result interms of the target program's concrete semantics.See Might et al.[41] for details ofsuch a formulation and a discussion the relationship of context-sensitive points-toanalysis to control- ow analysis for functional languages [56].StatementConstrainti:x = new T()foig  pt(x) [New]x = ypt(y)  pt(x) [Assign]x = y.foi2 pt(y)pt(oi:f)  pt(x)[Load]x.f = yoi2 pt(x)pt(y)  pt(oi:f)[Store]Table 1.Canonical statements for context-insensitive Java points-to analysis and thecorresponding points-to set constraints.temporary variables.Array objects are modeled as having a single eld arr thatmay point to any value stored in the array (so,x[i] = y is modeled as x.arr= y).Section 3.2 discusses handling of method calls.The inference rules in Table 1 describe how each statement type aects thecorresponding points-to sets.Note that since the analysis is eld sensitive,points-to sets are maintained both for variables (e.g.,pt(x)) and for instance elds ofabstract locations (e.g.,pt(oi:f)).Also note that in the New rule,the abstractlocation oiis named based on the statement label i,the standard heap abstrac-tion used in Andersen's analysis.Example Consider the following program (assume type T has a eld f):1 a = new T();2 b = new T();3 a.f = b;4 c = a.f;The following is a derivation of of o22 pt(c) according to the rules of Table 1,with rule applications labeled by the corresponding program statement's linenumber:o22 pt(b)l2o12 pt(a)l1pt(b)  pt(o1:f)l3o22 pt(o1:f)o12 pt(a)l1pt(o1:f)  pt(c)l4o22 pt(c)3.2 Context SensitivityWe nowextend our points-to analysis formulation to incorporate context-sensitivehandling of method calls.We formulate context sensitivity in a generic mannerand then show how to instantiate the formulation to derive standard analysisvariants.A context-sensitive points-to analysis separately analyzes a method m foreach calling context that arises at call sites of m.A calling context (or,simply,a context) is some abstraction of the program states that may arise at a callsite.Separately analyzing a method for each context removes imprecision due tocon ation of analysis results across its invocations.For example,consider the following program:1 id(p) { return p;}2 x = new Object();//o13 y = new Object();//o24 a = id(x);5 b = id(y);A context-insensitive analysis con ates the eects of all calls to id,in eectassuming that either object o1 or o2 may be passed as the parameter at thecalls on lines 4 and 5.This assumption leads to the imprecise conclusions that amay point to o2 and b to o1.Now,consider a context-sensitive points-to analysisthat uses a distinct context for each method call site.This analysis will processid separately for its two call sites,thereby precisely concluding that a may onlypoint to o1 and b only to o2.Formulation Our generic formulation of context-sensitive points-to analysisappears in Table 2.Compared to Table 1,the two additional statement typesrespectively allow for invoking and returning from procedures.We assume thata method m has formal parameters mthisfor the receiver and mp1;:::;mpnforthe remaining parameters,and we use a pseudo-variable mretto hold its returnvalue.8The analysis formulated in Table 2 maintains a set contexts(m) of the con-texts that have arisen at call sites of each method m.For each local pointervariable x,the analysis maintains a separate abstract pointer hx;ci to representx's possible values when its enclosing method is invoked in context c.Abstractlocations hoi;ci are similarly parameterized by a context.Finally,note that eachconstraint in the second column of Table 1 is written under the assumption thatthe corresponding statement in the rst column is from method m.Our formulation is parameterized by two key functions,which together spec-ify a context-sensitivity policy:{ The selector function,which determines what context to use for a callee atsome call site,and{ The heapSelector function,which determines what context c to use in anabstract location hoi;ci at allocation site i.8We elide static elds (global variables) and static methods from our formulation,astheir handling is straightforward.Since method contexts cannot be applied to globalvariables,their usage may blunt precision gains from context sensitivity.Statement in method mConstrainti:x = new T()c 2 contexts(m)hoi;heapSelector(c)i 2 pt(hx;ci)[New]x = yc 2 contexts(m)pt(hy;ci)  pt(hx;ci)[Assign]x = y.fc 2 contexts(m) hoi;c0i 2 pt(hy;ci)pt(hoi;c0i:f)  pt(hx;ci)[Load]x.f = yc 2 contexts(m) hoi;c0i 2 pt(hx;ci)pt(hy;ci)  pt(hoi;c0i:f)[Store]j:x = r.g(a1,...,an)c 2 contexts(m) hoi;c0i 2 pt(hr;ci)m0= dispatch(hoi;c0i;g)argvals = [fhoi;c0ig;pt(ha1;ci);:::;pt(han;ci)]c002 selector(m0;c;j;argvals)c002 contexts(m0)hoi;c0i 2 pt(hm0this;c00i)pt(hak;ci)  pt(hm0pk;c00i);1  k  npt(hm0ret;c00i)  pt(hx;ci)[Invoke]return xc 2 contexts(m)pt(hx;ci)  pt(hmret;ci)[Return]Table 2.Inference rules for context-sensitive points-to analysis.We rst present the inference rules for the analysis without specifying thesefunctions.Then,we show how standard variants of context-sensitive points-toanalysis can be expressed by instantiating selector and heapSelector appropri-ately.Inference Rules For the rst four statement types,the inference rules in Table 2are modied from those in Table 1 to include appropriate parameterization withcontexts.In each rule,a pre-condition chooses a context c from those that havebeen created for the enclosing method m,and c is used in the rule's conclusions.The heapSelector function fromthe context-sensitivity policy is used in the Newrule to obtain contexts for abstract locations.The nal Return rule in Table 2models return statements by simulating a copy from the returned variable xto the pseudo-variable mretfor the method.So,pt(hmret;ci) will include allabstract objects possibly returned by m in context c.By far,the Invoke rule is the most complex.The rst two lines of the rulemodel reasoning about virtual dispatch.Given a location hoi;c0i that the receiverargument hr;ci may point to,a dispatch function is invoked to resolve the virtualdispatch of g on hoi;c0i to a target method m0.(For Java,dispatch would beimplemented based on the type hierarchy and the concrete type of hoi;c0i.) Thisdirect reasoning about virtual dispatch implies that the analysis computes its callgraph on-the- y [32,52,72],rather than relying on a call graph computed withsome less precise analysis (recall the inter-dependence of points-to analysis andcall graph construction,discussed in Section 2).The tradeos between on-the- ycall graph construction and using a pre-computed call graph have been exploredextensively in the literature [21,32];we have found that on-the- y call graphconstruction usually improves both precision and performance (see Section 4.4).Once the target m0of the virtual call is discovered,the analysis uses theselector function from the context-sensitivity policy to determine which con-text(s) to use for this call of m0.selector can discriminate contexts for the targetmethod m0based on the caller's context c,the call site id j,and a list of possi-ble parameter values argvals.Given a context c00returned by selector,the rstconclusion of the Invoke rule ensures that c00is in the set of observed contextsfor m0.The nal three conclusions of the rule model parameter passing andreturn-value copying for the call.Entrypoints Points-to analyses with on-the- y call graph construction must beprovided with a set of entrypoint methods E that may be invoked by the envi-ronment to begin execution (e.g.,a main method for a standard Java program);these methods are assumed to be reachable by the analysis.Given E,the contextssets referenced in Table 1 should be initialized as follows:{ If m2 E,then contexts(m) = fDefaultg,where Default is a special dummycontext value.{ If m62 E,then contexts(m) =;.With these initial conditions,results will only be computed for methods deemedreachable by the analysis itself,as desired.Note that an entrypoint method may rely on initialization being performedbefore it is invoked,e.g.,the creation of the String[] array parameter for amain method.In WALA [69],such behavior is modeled in a synthetic\fake rootmethod"that serves as a single root for the call graph and contains invocationsof the real entrypoints.The fake root method includes code to pass objects toentrypoint parameters based on customizable heuristics (e.g.,passing an objectwhose concrete type matches the parameter's declared type).In general,pre-cisely modeling how an environment initializes objects before executing an en-trypoint can be quite dicult (e.g.,for framework-based applications [59]),andthis modeling can be critical to getting useful results from a points-to analysis.Context Sensitivity Variants In this section,we discuss several standardvariants of context-sensitive Andersen's-style points-to analysis,and we showhow the analyses can be expressed by instantiating the selector and heapSelectorfunctions used in Table 2.Note that a context-insensitive analysis (with on-the-y call graph construction) can be expressed using the dummy Default context(we use''for an unused argument):selector(;;;) = fDefaultgheapSelector() = DefaultCall Strings A standard technique to distinguish contexts is via call strings [55],which abstract the possible call stacks under which a method may be invoked.Call strings are typically represented as a sequence of call site identiers,corre-sponding to a (partial) call stack.The following selector function gives a call-string-sensitive context for a callee at site j,given the caller context [j0;:::;jn]:selector(;[j0;j1;:::;jn];j;) = f[j;j0;j1;:::;jn]g (1)For full precision,the heapSelector function should simply re-use the contextsprovided by selector,i.e.,heapSelector(c) = c.This choice of heapSelector yieldsa context-sensitive heap abstraction.As a simple example,consider the following program.1 Object f1(T x) { return x.f;}2 Object f2(T x) { return f1(x);}3...4 p = f2(q);5 r = f2(s);Given selector as dened above,method f2() will be analyzed in contexts [s4]and [s5] (we write sifor the call site on line i),and f1() will be analyzed incontexts [s2;s4] and [s2;s5].Unfortunately,the nave selector function above could cause non-terminationin the presence of method recursion,as call strings may grow without bound.Even without recursion,analysis time grows exponentially in the number ofmethods in the worst case,as the worst-case number of paths in a program'scall graph is exponential in the number of methods.In practice,over 1014pos-sible call strings have been observed for a medium-sized program [72],makingstraightforward use of long call strings intractable.A standard method for improving scalability of the call-string approach inpractice is k-limiting [55],where the maximumcall-string length is bounded by asmall constant k.This approach has been employed in various previous systems,though the consensus seems to be that bounded object sensitivity (discussedbelow) provides greater precision for the same or less cost [34].Instead of usingk-limiting in selector,Whaley and Lam[72] achieve scalability by using compactBDD data structures (see Section 4) and a context-insensitive heap abstraction,i.e.,with heapSelector(c) = Default.(In essence,k-limiting is performed in theheap selector,with k = 0.) While the scalability of their analysis was impressive,later work showed that its precision was lacking for typical clients due to thecoarse heap abstraction [34].For certain classes of programanalyses,a result equivalent to using arbitrary-length call strings can be computed eciently,for example,so-called IFDS prob-lems [51].9For this level of precision,the analysis result is typically computedusing a summary-based approach [51,55] that is not directly expressible in theformulation of Table 2.However,Reps has shown that full context sensitivityfor a eld-sensitive analysis is undecidable [50].While summary-based points-toanalyses have been developed [73,74],we are unaware of any such analysis thatscales to large Java programs.Object Sensitivity Rather than distinguishing a method's invocations based oncall strings,an object-sensitive analysis [42]10uses the (abstract) objects passedas the receiver argument to the method.The intuition behind object sensitiv-ity is that in typical object-oriented design,the state of an object is accessedor mutated via its instance methods (e.g.,\setter"and\getter"methods forinstance elds).Hence,by using receiver objects to distinguish contexts,anobject-sensitive analysis can avoid con ation of operations performed on dis-tinct objects.In terms of our Table 2 formulation,object-sensitive analysis and more recentvariants [58] can be expressed via the following selector function:selector(;;;argvals) =[ho;ci2argvals[0]locToContext(ho;ci) (2)locToContext converts an abstract location (which includes context information)into a context.For standard object sensitivity [42],11a context is a list of allo-cation sites,and locToContext simply adds to that list:locToContext(hoi;li) = cons(oi;l) (3)(As in Lisp,cons(oi;[o1;o2;:::]) = [oi;o1;o2;:::].) As discussed previously,usingheapSelector(c) = c yields a context-sensitive heap abstraction.To illustrate object sensitivity,consider the following example:1 class A { B makeB() { return new B();} }2 class B { Object makeObj() { return new Object();} }3...4 A a1 = new A();5 A a2 = new A();6 B b1 = a1.makeB();7 B b2 = a2.makeB();8 Object p1 = b1.makeObj();9 Object p2 = b2.makeObj();9In the literature,an analysis computing such a result is often termed\context-sensitive,"but we avoid that usage,as we consider contexts other than call strings.10While Milanova's work [42] introduced the term\object sensitivity,"similar ideaswere employed in earlier work on object-oriented type inference [1,45].11While alternate object-sensitivity denitions have appeared [35],Smaragdakis etal.[58] showed that Milanova's denition [42] is most eective.With object-sensitive analysis dened by the selector and heapSelector functionabove,makeB() will be analyzed in contexts [o4] and [o5] (with abstract objectslabeled by allocating line number),and we have pt(b1) = fho1;[o4]ig and pt(b2) =fho1;[o5]ig due to the context-sensitive heap abstraction.Similarly,makeObj()is analyzed in contexts [o1;o4] and [o1;o5],pt(p1) = fho2;[o1;o4]ig,and pt(p2) =fho2;[o1;o5]ig.As with call-string sensitivity,k-limiting,either in selector or heapSelector,is necessary to achieve scalability to realistic programs.The literature containsinconsistent denitions of what exactly it means to limit object-sensitive con-texts with a particular value of k;see Smaragdakis et al.[58] for an extendeddiscussion.In general,the precision of an object-sensitive analysis is incomparable tothat of a call-string-sensitive analysis [42].Object sensitivity can lose precisioncompared to call-string sensitivity by merging across call sites that pass thesame receiver object,but it may gain precision by using multiple contexts at asingle call site (when multiple receiver objects are possible).Work by Lhotak andHendren [34] has shown that for small values of k,object-sensitive analysis yieldsmore precise results for common clients than a call-string-sensitive analysis.Inpractice,a mix of object- and call-string sensitivity is often used,e.g.,with call-string sensitivity being employed only for static methods (which have no receiverargument).Recently,Smaragdakis et al.[58] have identied type sensitivity as a usefultechnique for obtaining much of the precision of object sensitivity with greaterscalability.In one variant of type sensitivity,a context is a list of types ratherthan allocation sites,and locToContext converts the allocation site from theabstract location into a type:locToContext(hoi;li) = cons(enclosingClass(oi);l) (4)Rather than using the concrete type of oiin the context,the concrete type ofthe enclosing class for the allocation site is used,yielding greater precision inpractice [58].Another variant of type sensitivity allows for one abstract locationto remain in the context:locToContext(hoi;cons(oj;l)i) = cons(oi;cons(enclosingClass(oj);l)) (5)Again,k-limiting is required for scalability of either of these schemes.Smarag-dakis et al.[58] show how k-limited versions of these type-sensitive analysesprovide much of the precision of standard object-sensitive analysis with signi-cantly less cost.Rather than limiting attention to the receiver argument,the cartesian productalgorithm (CPA) [1] distinguishes contexts based on the objects passed in allargument positions.We can dene the selector function for CPA as follows:cartProd(argvals) =nYi=0argvals[i]selector(;;;argvals) =[l2cartProd(argvals)locListToContext(l) (6)The cartProd function computes the (generalized) cartesian product of all entriesin argvals,yielding a set of lists of abstract locations.Each such list l is convertedto a context using locListToContext,analogous to the use of locToContext forobject sensitivity.In fact,the object-sensitive analysis variants described abovecan also be formulated as a special case of CPA,by only using values for thereceiver argument in locListToContext.The locToContext used in Equation 3 forstandard object sensitivity can be generalized to handle all argument positions:locListToContext([hoi0;c0i;:::;hoin;cni]) = (cons(oi0;c0);:::;cons(oin;cn)) (7)As formulated above,CPA creates many more contexts per method than theequivalent object-sensitive analysis,a signicant scalability barrier.In its originalformulation [1],CPA was used for type inference,and the heap abstractionconsisted of types rather than allocation sites,making scalability more feasible.While full CPA based on allocation sites may not scale,we believe that contextsbased on arguments other than the receiver may still prove useful.Unication-Based Approaches Some previous approaches to context-sensitivepoints-to analysis have employed equality constraints for assignments [16,31,44],which cannot be expressed as a context-sensitivity policy in the formulation ofTable 2.12In this approach,statement x = y is modeled with constraint pt(y) =pt(x) instead of pt(y)  pt(x),enabling the use of fast union-nd data structuresto represent equal points-to sets.While this approach has been shown to scale forC++ programs [31],we are unaware of a scalable implementation for Java-likelanguages.In particular,the increased use of virtual dispatch in Java negativelyaects the scalability of the equality-based approach [44].4 Implementing Points-To AnalysisHere we present techniques for eciently implementing the points-to analysesformulated in Section 3.Over the past two decades,advances in implementationtechniques (and hardware advances) have shown that some of these variants canscale to relatively large programs (papers reporting analysis of millions of linesof code are now commonplace).We present basic techniques for implementing an12For languages like Java and C#,a context-insensitive equality-based approach likeSteensgaard's analysis [65] does not work|since all objects are passed as the thisparameter to the constructor of the root object type,the analysis would concludethat all points-to sets are equal.Andersen's-style analysis,and then brie y review some of the most prominentadvanced techniques which have appeared in the literature.Unfortunately,the pointer analysis literature contains several dierent for-malisms for describing analyses and implementations.Presentations use variousmathematical frameworks,including set constraints [17],context-free-languagereachability [49],and Datalog [72].Each framework elucidates certain issuesmost clearly,and the choice of framework depends on the best match betweenthe input language,the analysis variant,and the author's taste.This section discusses implementation techniques based on old-fashioned al-gorithmic description of imperative code based on xed-point iteration.A previ-ous paper [62] presented an algorithmic analysis of this algorithm,which shedssome light on the performance issues which arise in practice.We restate someof the key points from that work [62] here.The WALA pointer analysis imple-mentation [69] follows this algorithm directly.4.1 AlgorithmHere we present an algorithmfor Andersen's analysis for Java,as specied in Ta-ble 1 in Section 3.The algorithm is most similar to Pearce et al.'s algorithm forC [47] and also resembles existing algorithms for Java (e.g.,that of Lhotak andHendren [32]).We do not give detailed pseudocode for implementing a context-sensitive analysis with on-the- y call-graph construction,as formulated in Ta-ble 2,but we discuss some of the key implementation issues later in the section.The algorithm constructs a ow graph G representing the pointer ow fora program and computes its (partial) transitive closure,a standard points-toanalysis technique (e.g.,see [17,25,27]).G has nodes for variables,abstractlocations,and elds of abstract locations.At algorithm termination,G has anedge n!n0i one of the following two conditions holds:1.n is an abstract location oirepresenting a statement x = new T(),and n0is x.2.pt(n)  pt(n0) according to some rule in Table 1.Given a graph G satisfying these conditions,it is clear that oi2 pt(x) i x isreachable from oiin G.Hence,the transitive closure of G|where only abstractlocation nodes are considered sources|yields the desired points-to analysis re-sult.Since ow relationships for abstract-location elds depend on the points-tosets of base pointers for the corresponding eld accesses (see the Load andStore rules referencing pt(oi:f) in Table 1),certain edges in G can only be in-serted after some reachability has been determined,yielding a dynamic transitiveclosure (DTC) problem.Pseudocode for the analysis algorithmappears in Figure 3.The DoAnalysisroutine takes a set of program statements of the forms shown in Table 1 as in-put.(We assume suitable data structures that,given a variable x,yield all loadstatements y = x.f and store statements x.f = y in constant time per state-ment.) The algorithm maintains a ow graph G as just described and computesDoAnalysis()1 for each statement i:x = new T() do2 pt(x) pt(x) [ foig,oifresh3 add x to worklist4 for each statement x = y do5 add edge y!x to G6 while worklist 6=;do7 remove n from worklist8 for each edge n!n02 G do9 DiffProp(pt(n);n0)10 if n represents a local x11 then for each statement x.f = y do12 for each oi2 pt(n) do13 if y!oi:f 62 G14 then add edge y!oi:f to G15 DiffProp(pt(y);oi:f)16 for each statement y = x.f do17 for each oi2 pt(n) do18 if oi:f!y 62 G19 then add edge oi:f!y to G20 DiffProp(pt(oi:f);y)21 pt(n) pt(n) [ pt(n)22 pt(n) ;DiffProp(srcSet;n)1 pt(n) pt(n) [ (srcSet pt(n))2 if pt(n) changed then add n to worklistFig.3.Pseudocode for the points-to analysis algorithm.a points-to set pt(x) for each variable x,representing the transitive closure inG from abstract locations.Note that abstract location nodes are eschewed,andinstead the relevant points-to sets are initialized appropriately (line 2).The algorithm employs dierence propagation [18,32,47] to reduce the workof propagating reachability facts.For each node n in G,pt(n) holds thoseabstract locations oisuch that (1) the algorithm has discovered that n is reach-able from oiand (2) this reachability information has not yet propagated ton's successors in G.pt(n) holds those abstract locations for which (1) holdsand propagation to successors of n is complete.The DiffProp routine updatesa dierence set pt(n) with those values from srcSet not already contained inpt(n).After a node n has been removed fromthe worklist and processed,all cur-rent reachability information has been propagated to n's successors,so pt(n)is added to pt(n) and emptied (lines 21 and 22).Theorem 1 DoAnalysis terminates and computes the points-to analysis resultspecied in Table 1.Proof.(Sketch) DoAnalysis terminates since (1) the constructed graph is niteand (2) a node n is only added to the worklist when pt(n) changes (line 2 ofDiffProp),which can only occur a nite number of times.For the most part,thecorrespondence of the computed result to the rules of Table 1 is straightforward.One subtlety is the handling of the addition of new graph edges due to eldaccesses.When an edge y!oi:f is added to G to handle a puteld statement(line 14),only pt(y) is propagated across the edge,not pt(y) (line 15).Thisoperation is correct because if pt(y) 6=;,then y must be on the worklist,andhence pt(y) will be propagated across the edge when y is removed from theworklist.A similar argument holds for the propagation of pt(oi:f) at line 20.ut4.2 ComplexityA simple algorithmic analysis shows that the algorithm in Figure 3 has worst-case cubic complexity.Note that dierence propagation is required to ensure thecubic complexity bound for this worklist-style algorithm [46].In practice,many papers have reported scaling behavior signicantly betterthan cubic.Two of the authors have published an analysis [62] that explains whythis pointer analysis usually runs in quadratic time on strongly-typed languagessuch as Java.The key insight is that Java's strong type system restricts thestructure of the graph G to be relatively sparse for most pointer assignments.By bounding the sparsity of this graph,we can show that the algorithm usuallyruns in quadratic time.We refer the reader to the previous paper [62] for moredetails.The previous work [62] also compares the expected behavior with the ob-served behavior in the WALA pointer analysis implementation.The paper re-ports results that show that the WALA implementation scales roughly quadrat-ically with program size on Java programs,as predicted.As a rough character-ization of overall scalability,[62] reports that the WALA implementation canusually perform this analysis on programs with a few hundred thousand lines ofcode in a few minutes.However,this scalability can vary widely,in particulardepending on implementation details inside the standard library,to be discussedfurther in Section 6.4.3 OptimizationsIn practice,an implementation can use several techniques in conjunction withthe code in Figure 3 to improve performance by signicant constant factors.Type Filters In strongly-typed languages,type lters provide a simple buthighly eective optimization which improves both precision and (usually) per-formance [21,32].Consider,for example,the following Java code:Integer i = new Integer(0);Double d = new Double(0.0);Object o = new Random().nextBoolean()?i:d;Object p = (o instanceof Integer)?(Integer)o:null;o.toString();The basic algorithm of Figure 3,which ignores the cast statement,wouldconclude imprecisely that p may alias d.Slightly less obviously,consider the alias relation for the receiver (this point-ers) in the methods Integer.toString() and Double.toString().Due to theo.toString() invocation,the basic algorithm would conclude that since o mayalias either i or d,then so may the receiver for each toString() method.How-ever,this is imprecise,since the semantics of virtual dispatch ensure that thereceiver of Integer.toString() cannot point to an object of type Double.Type lters provide a simple technique to build these language constraintsinto the points-to analysis,in order to improve precision.We describe the tech-nique informally as follows.In Figure 3,we add labels to the edges in the graphG.Each label represents a type in the source language { a label T can indi-cate either a\cone type"(any subtype of T) or a\point type"(only objects ofconcrete type T).We modify the algorithm to add labels to the graph based on the sourcecode.For example,for an assignment x = (T) y,we label the edge y!x with(cone type) T.We add similar labels for edges that arise from assignments fromactual parameters to formal parameters,to capture type constraints imposed byvirtual dispatch.Finally,we would modify the DiffProp routine to only addappropriately typed objects to points-to sets.This can be accomplished with abit-vector intersection,updating the bit vector for each type as allocation sitesare discovered.Cycle elimination Consider the following Java code snippet:Object a =...Object b =...Object c =...while (...) {if (?) a = b;else if (?) b = c;else c = b;}It should be clear that a points-to-analysis will compute the same points-to-set for a,b,and c.This arises from a cycle in the ow graph.Cycles arise relatively frequently in ow graphs for ow-insensitive points-toanalysis,especially for weakly-typed languages like C.When a cycle arises in theow graph,a points-to analysis implementation can collapse the cycle in the owgraph and use a single representative points-to set for all variables in the cycle.This optimization can drastically reduce both space consumption (fewer points-to sets and constraints),and also time (less propagation to a xed point).A keychallenge with cycle elimination is identifying cycles as they arise dynamically(due to ow graph edge additions),and a large body of work studies ecientcycle detection for C points-to analysis [17,23,27].To our knowledge,the results with cycle elimination for Java points-to anal-ysis have been much less impressive than those for C.We personally experi-mented with implementing cycle elimination in WALA and found it to providelittle benet.Paradoxically,cycle elimination works best for cases where thepoints-to analysis is often unable to distinguish between related points-to-sets.Recall that when analyzing Java,we can use type lters to achieve a more precisesolution than typical for untyped C programs.Eectively,type lters\break cy-cles,"since a labeled edge breaks the invariant that all variables in a cycle havethe same points-to-set.It seems that for analyses with richer abstractions,cycleelimination becomes less eective,since the existence of huge cycles relies on acoarse abstraction that fails to distinguish locations.Method-Local State In WALA,if a variable's points-to set is determinedentirely by statements in the enclosing method,the points-to set is computedon-demand rather than via the global constraint system.Consider the followingexample:void m(T x) {Object y = new Object();Object z = y;Object w = x.f;}For this case,pt(y) and pt(z) would be computed only when required,while theconstraint system would be used to compute pt(w) (since it depends on a eldof parameter x).Though it complicates the implementation,we have found thisoptimization to yield signicant space savings in practice.Separate handling oflocal state has been employed in other previous work [71,73].4.4 Handling Method CallsOn-the- y call-graph construction (see discussion in Section 3.2) has a signicantimpact on real-world points-to analysis performance.If constraint generationcosts are ignored,on-the- y call graph reasoning can slow down analysis,asmore iterations are required to reach a xed point [72].However,if the costs ofconstraint generation are considered (which we believe is a more realistic model),on-the- y call graph building improves performance,since constraints need notbe generated for unreachable library code.Also,on-the- y call graph reasoningcan make the ow graph for a program more sparse,improving performance.As discussed in Section 3.2,context-sensitive points-to analysis can often givemuch more precise results for object-oriented programs than context-insensitiveanalysis.The most straightforward strategy for implementing context sensitivityis via cloning.Recall from Section 3.2 that context-sensitive analysis computes adierent solution (points-to set) for local variables that arise in dierent contexts.With cloning,the implementation simply creates a distinct copy of the relevantprogram structures for each context distinguished.For example,consider a context-sensitivity policy employing k-limited call-string contexts with k = 1.For this policy,a cloning-based analysis would cloneeach method for each possible call site,and compute a separate solution for eachclone.Intuitively,this can eectively blow up the program size by a quadraticfactor |if there are N methods,each might be cloned N times,resulting in N2clones.Data structures Cloning for context-sensitivity exacerbates the demand for bothtime and space.Much work over the last decade has improved techniques toexploit redundancy in the pointer analysis structures to mitigate these factors.The algorithm of Figure 3 must maintain two data structures,each of whichgrows super-linearly with program size:{ the set of constraints that represent the ow graph (G),and{ the points-to sets for each program variable.A straightforward analysis implementation would represent the points-to setsusing bit vectors,as commonly presented in textbooks for data ow analysis [2].The implementation can map each abstract object (e.g.allocation site) to a natu-ral number,and then use these as identiers in bit vector indices.At rst glance,this seems like a compact representation,since it appears to devote roughly onebit of space to each unique piece of information in the output.However,better solutions have been developed.Several key advances inpointer analysis implementation have relied on clever data structures to reducethe space costs of constraints and points-to sets by exploiting redundancy.Thekey insight is that many points-to sets are similar,due to the patterns by whichvalues ow between variables in real programs.So,several works have proposeddata structures to exploit these redundancies.The WALA implementation uses a clever\shared-bit vector"representationpresented by Heintze [24].This implementation exploits the commonalities in bitvector contents,resulting in a bit vector representation that shares large commonsubsets.Each bit vector is represented as the union of a shared common baseand a relatively small delta.In our experience,the Heintze shared bit vectors can dramatically reduce thespace costs of a cloning-based pointer analysis implementation,and allows somelimited context sensitivity policies to scale to relatively large programs.However,these techniques cannot suce for aggressive context-sensitivity policies,such asfull call-string context sensitivity for variables [72].For these policies,the numberof clones grows exponentially with program size,to the point where even one bitper clone would demand more memory than there are ip- ops in the universe.Several groups have presented solutions based on exploiting binary decisiondiagrams (BDDs),which potentially allow a system to explore an exponentialspace using a tractable implicit representation.This technique has been used ex-tensively in explicit-state model checking [10],and several papers indicate thatsimilar techniques can work for certain avors of context-sensitive pointer analy-sis [6,7,33,72,76].Compared with shared-bit vectors,BDDs have the advantageof employing the same compact representation for both input constraints and theoutput points-to relation.Compact constraint representation makes aggressivepolicies like full call-string sensitivity for variables possible [72].On the otherhand,performance of BDD-based analyses can be fragile with respect to variableorderings [70],and using BDDs requires representing all relevant analysis statein BDD relations,making integration with other systems more dicult (thoughwork has been done to ease this integration [35]).Employing dierence propagation exhaustively as in Figure 3 may doublespace requirements and hence represent an unattractive space-time tradeo.Aset implementation that enables propagation of abstract locations in parallel,like shared-bit vectors,lessens the need for exhaustive dierence propagationin practice.In our experience,the key benet of dierence propagation lies inoperations performed for each abstract location in a points-to set,e.g.,edgeadding (see lines 12 and 17 in Figure 3).To save space,WALA [69] only usesdierence propagation for edge adding and for handling virtual call receivers(since with on-the- y call graph construction,each receiver abstract locationmay yield a new call target).Also note that the best data structure for thept(x) sets may dier from the pt(x) sets to support smaller sets and iterationeciently;see [32,46] for further discussion.4.5 Demand-Driven AnalysisThe previous discussion focused on computing an exhaustive points-to analysissolution,i.e.,computing all points-to sets for a program.However,recall that theprimary motivation for pointer analysis is to enable some client,which performssome higher-level analysis such as for program understanding,verication,oroptimization.For many such clients,computing the full solution is not required.The client will demand information for only a few program variables,and so itmakes sense to compute the information requested on-demand.13Heintze and Tardieu presented a highly in uential paper describing a demand-driven version of context-insensitive Andersen's analysis,showing performancebenets for a client resolving C function pointers [26].A demand-driven analysisformulation can also be obtained from a context-free-language reachability for-mulation of Andersen's analysis [49,64] via the magic-sets transformation [48].14Additional precision benets can be obtained via renement of the analysisabstraction where relevant to client queries.Guyer and Lin [22] showed thebenets of such an approach for various C points-to analysis clients.Sridharanet al.[60,64] gave a renement-based points-to analysis that exhibited signicantprecision and scalability improvements for several Java points-to analysis clients.13On-demand computation of purely-local points-to sets was discussed in Section 4.3;here we extend the discussion to on-demand computation of any points-to set.14For C,applying the magic-sets transformation to the Melski-Reps formulation [49]yields an equivalent analysis to that of Heintze and Tardieu [26].Finally,recent work by Liang et al.[36,37,38] has shown that precision for clientscan be improved with local improvements to the heap abstraction of a points-toanalysis.5 Must-Alias AnalysisHeretofore,we have concentrated on ow-insensitive alias analyses.These anal-yses produce a statically bounded (abstract) representation of the program'sruntime heap.The pointer analysis solution indicates which abstract objectseach pointer-valued expression in the program may denote.Unfortunately,these scalable analyses have serious disadvantage when usedfor verication:they can answer only may-alias questions,that is,whether twovariable may potentially refer to the same object.They cannot in general answermust-alias questions,that is,whether to two variables must always refer to thesame object.May-alias information requires a verier to model any operation performedthrough a pointer dereference as an operation that may or may not be performedon the possible target abstract objects identied by the pointer analysis { thisis popularly known as a\weak update"as opposed to a\strong update"[8].In this section,we present a ow-sensitive must-alias analysis that is basedon dynamic partition of the heap,and show how its greater precision is used toverify typestate properties.5.1 On the Importance of Strong Updates1 File makeFile {2 return new File();//ho1;initi;ho2;initi3 }4 File f = makeFile();//ho1;initi;ho2;initi5 File g = makeFile();//ho1;initi;ho2;initi6 if(?)7 f.open();//ho1;openi;ho2;initi8 else9 g.open();10 f.read();//ho1;openi;ho2;initi11 g.read();//ho1;openi;ho2;erri12 }Fig.4.Concrete states for a program reading from two File objects allocated at thesame allocation site.The example shows states for an execution in which the conditionevaluates to true.Consider a File type which requires invoking open() on a File object beforeinvoking read(),and consider the simple example program of Fig.4.The allo-1 File makeFile {2 return new File();//hA;initi3 }4 File f = makeFile();//hA;initi5 File g = makeFile();//hA;initi6 if(?)7 f.open();//hA;openi8 else9 g.open();//hA;openi10 f.read();//hA;openi11 g.read();//hA;openi12 }Fig.5.Unsound update of abstract states for the example of Fig.4.cation statement in Line 2 allocates two File objects in some initial state init.In the gure,we write ho;sti to denote that an object o is in state st.In this example,a typical points-to analysis will represent both objects al-located at Line 2 by a single abstract object A.The abstract state at Line 2would therefore be hA;initi,representing an arbitrary number of File objectsthat have been allocated at this point,all of which are in their initial state.Now,consider the operation f.open invoked on the abstract object hA;initi.What should be the eect of this operation?The abstract object hA;initi rep-resents all objects allocated at Line 2.Assuming that the invocation of f.openyields the state hA;openi is equivalent to assuming that the state of all lesrepresented by A has turned to open,which is unsound in general.For example,Fig.5 shows the unsoundness of this scheme for the example from Fig.4|thepossible err state at Line 11 is not represented by the abstract states.To guarantee soundness,the eect of f.open() would have to representthe possibility that some concrete objects represented by A remain in theirinitial state.As a result,the abstract state after f.open() should re ect bothpossibilities:hA;openi,where the object is in its open state,and hA;initi wherethe object remains in its initial state.Such an update is referred to as a weakupdate,as it maintains the old state (hA;initi) as part of the updated state.Using weak updates,however,would fail to verify even a simple program suchthe one shown in Fig.6.Addressing this issue requires knowing must-alias information.For Fig.6,the analysis must prove that at Line 2,f must point to the object allocated byLine 1;with this knowledge,the analysis can show that the object can only be inthe open state after the call,enabling verication.While computing this must-alias information would be straightforward for Fig.6,in general the problem ismuch more challenging,due to language features like loops and method calls.The literature contains many approaches for must-alias analysis,rangingfrom relatively simple abstractions such as the recency abstraction [5] and ran-domisolation [29],to full- edged shape analysis [53].We next review a particular1 File f = new File();//hA;initi2 f.open();//hA;initi;hA;openi3 f.read();//hA;erri;hA;openiFig.6.Simple correct example that cannot be veried directly using weak updates.abstraction framework to combine may and must-alias analysis information,de-veloped for typestate verication [20,40,57,75].The framework is based onmaintaining must and must-not points-to information based on access paths.5.2 Access PathsAbstractions based on allocation sites impose a xed partition on memory loca-tions.We next present an abstraction to allow the name of an abstract objectto change dynamically based on the program variables (and paths) that point toit.Specically,we dene the notion of an access path,a sequence of referencesthat points to a heap allocated object,and name an abstract object by the setof access paths that may/must refer to it.Concrete Semantics We assume a standard concrete semantics which denes aprogram state and evaluation of an expression in a program state.The semanticdomains are dened in a standard way as follows:L\2 2objects\v\2 Val = objects\[ fnullg\2 Env = VarId!Valh\2 Heap = objects\FieldId!Val\= hL\;\;h\i 2 States = 2objects\Env Heapwhere objects\is an unbounded set of dynamically allocated objects,VarId isa set of local variable identiers,and FieldId is a set of eld identiers.Wegenerally use the\superscript to denote concrete entities.A program state keeps track of the set of allocated objects (L\),an envi-ronment mapping local variables to values (\),and a mapping from elds ofallocated objects to values (h\).We also dene the notion of an access path as follows:Apointer path 2  =FieldIdis a (possibly empty) sequence of eld identiers.The empty sequenceis denoted by .We use the shorthand fkwhere f 2 FieldId to mean a sequenceof length k of accesses along a eld f.An access path p  x: 2 VarId  is apair consisting of a local variable x and a pointer path .We denote by APs all possible access paths in a program.The l-value ofaccess path p,denote by \[p],is recursively dened using the environment andheap mappings,in the standard manner.Given a concrete object o\in a state\,we denote by AP\(o\) the set of access paths that point to o.Maintaining Must Points-to Information To describe our abstraction,we rst as-sume that a preliminary ow-insensitive points-to analysis has run.This analysisgenerates an abstract points-to graph based on a static set of abstract memorylocations.For this discussion,we call each abstract memory location from thepreliminary points-to analysis an instance key.The more precise analysis performs a ow-sensitive,context-sensitive inter-procedural propagation of abstract states.Each abstract state represents a setof concrete states that may arise during execution,and encodes information re-garding certain aliasing relationships which these concrete states share.We rep-resent aliasing relationships with tuples of the formho;unique;APm;May;APmniwhere:{ o is an instance key.{ unique indicates whether the corresponding allocation site has a single con-crete live object.{ APmis a set of access paths that must point-to o.{ May is a boolean that indicates whether there are access paths (not in themust set) that may point to o.{ APmnis a set of access paths that do not point-to o.This parameterized abstract representation has four dimensions,for the lengthand width of each access path set (must and must-not).The length of an accesspath set indicates the maximal length of an access path in the set,similar tothe parameter k in k-limited context-sensitivity policies.The width of an accesspath set limits the number of access paths in this set.An abstract state is a set of tuples.We observe that a conservative represen-tation of the concrete program state must obey the following properties:1.An instance key can be indicated as unique if it represents a single objectfor this program state.2.The access path sets (the must and the must-not) do not need to be com-plete.This does not compromise the soundness of the abstraction,since otherelements in the tuple can indicate the existence of other possible aliases.3.The must and must-not access path sets can be regarded as another heappartitioning which partitions an instance key into the two sets of accesspaths:those that a) must alias this abstract object,and b) denitely do notalias this abstract object.If the must-alias set is non-empty,the must-aliaspartition represents a single concrete object.4.If May = false,the must access path is complete;it contains all access pathsto this object.This can be formally stated as follows:Denition 1 A tuple ho;unique;APm;May;APmni is a sound representation ofobject o\at program state \when the following conditions hold:{ o = ik(o\){ unique )fx\2 live(\) j ik(x\) = og = fo\g{ APm AP\(o\){ (:May )(APm= AP\(o\))){ APmn\AP\(o\) =;where ik is an abstraction mapping a concrete object to the instance key thatrepresents it,and live(\) is dened to be fx\j AP\(x\) 6=;g.Denition 2 An abstract state  is a sound representation of a concrete state\= hL\;\;h\i if for every object o\2 L\there exists a tuple in  that providesa sound representation of o\.Abstract Transformers Table 3 shows how a tuple is transformed by the in-terpretation of various statements.The eect of a transfer function on a givenabstract state is dened by taking the union of applying the tuple transfer func-tions of Table 3 to each tuple in the abstract state.The interpretation of an allocation statement\v = new T()"with instancekey o will generate a tuple ho;true;fvg;false;;i representing the newly allo-cated object.When May is false,the APmncomponent is redundant and,hence,initialized to be empty.When a tuple reaches the allocation site that created it,we generate twotuples,one representing the newly created object,and one representing the in-coming tuple.We change the uniqueness ag to false for reasons explained earlier.For assignment statements,we update the APmand APmnas appropriate.Note that since we place a nite bound on access path lengths,there are anite number of possible abstract states,so xed-point iteration terminates.Thenumber of possible abstract states is exponential in the access path bound.To use this aliasing information in a client analysis,we can extend the ab-stract transformers of Table 3 to also maintain the abstract state of an objectbeing tracked.For example,consider a simple typestate analysis to verify that only openles are read.We extend the tuple to track the abstract state of a File object,which can be init (just initialized),open,or closed.1 Collection files =...2 while (...) {3 File f = new File();//(hA;true;ffg;false;;i;init);(hA;false;;;true;ffgi;open)4 files.add(f);//(hA;true;ffg;true;;i;init);(hA;false;;;true;ffgi;open)5 f.open();//(hA;true;ffg;true;;i;open);(hA;false;;;true;ffgi;open)6 f.read();//(hA;true;ffg;true;;i;open);(hA;false;;;true;ffgi;open)7 }Fig.7.Illustration of a strong update to the state of a File object using access paths.Consider the example of Fig.7.In this example,the abstraction is able tocapture the fact that at Line 5,f must point to the object allocated by the mostStmt SResulting abstract tuplesv = new T()ho;false;APmn startsWith(v;APm);May;APmn[ fvgiwhere o = Stmt Sho;true;fvg;false;;iv = nullho;unique;AP0m;May;AP0mniAP0m:= APmn startsWith(v;APm)APmn:= APmn[ fvgv:f = nullho;unique;AP0m;May;AP0mniAP0m:= APmn fe0:f: j mayAlias(e0;v); 2 gAP0mn:= APmn[ fv:fgv = eho;unique;AP0m;May;AP0mniAP0m:= APm[ fv: j e: 2 APmgAP0mn:= APmnn fvje 62 APmngv:f = eho;unique;AP0m;May0;AP0mniAP0m:= APm[ fv:f: j e: 2 APmgMay0:= May _ 9v:f: 2 AP0m:9p 2 AP:mayAlias(v;p) ^ p:f: 62 AP0mAP0mn:= APmnn fv:fje 62 APmngstartsWith(v;P) = fv: j 2 PgTable 3.Transfer functions for statements indicating how an incoming tupleho;unique;APm;May;APmni is transformed,where pt(e) is the set of instance keyspointed-to by e in the ow-insensitive solution,v 2 VarId.mayAlias(e1;e2) i pointeranalysis indicates e1and e2may point to the same instance key.recent execution of line 3,and its state can be therefore update to open.Thismeans that read() can be safely invoked on the object pointed to by f at Line 6.When a typestate method is invoked,we can (1) use the APmninformationto avoid changing the typestate of the tuple where possible,(2) use the APminformation to perform strong updates on the tuple where possible,and (3) usethe uniqueness information also to perform strong updates where possible.There are several more powerful tricks which can use this information toimprove precision { notably a focus operation which performs a limited form ofcase splitting to improve abstraction precision.We refer the reader to [20] forfurther discussion of techniques using these abstractions.As explained earlier,we enforce limits on the length and the number of ac-cess paths allowed in the APmand APmncomponents to keep the number oftuples generated nite.We designed this abstract domain specically to discardaccess-path information soundly,allowing heuristics that trade precision for per-formance but do not sacrice soundness.This feature is crucial for scalability;the analysis would suer an unreasonable explosion of data ow facts if it soundlytracked every possible access path,as in much prior work [9,14,15,30].However,the abstraction just presented still relies on a preliminary soundpoints-to analysis.The abstraction introduces machinery designed to exploit thepoints-to analysis,in order to maintain a sound (over-approximate) representa-tion of the set of possible concrete states.In practice,modern large Java programs introduce substantial barriers tothis style of sound verication.As we discuss next,certain features of theseprograms introduce prohibitive obstacles to running a preliminary,sound points-to analysis.However,we show that access-path tracking in the style describedhere is still useful in the context of under-approximate analyses,which do notguarantee coverage of all program states.6 Analyzing Modern Java ProgramsIn our most recent work,we have found that large libraries and re ection usage inmodern Java programs and libraries have made points-to analysis (as describedin Sections 3 and 4) a poor basis for alias reasoning.In this section,we describe inmore detail why points-to analysis does not work well for modern Java programsand libraries,and how we have worked around this issue with under-approximatetechniques based on type-based call graph construction and access-path tracking.We note that though we have not found points-to analysis to work well formodern desktop and server Java applications,it remains relevant in other do-mains.Scalability issues with points-to analysis may be less severe in cases whereapplications tend to have less code and use smaller libraries,e.g.,mobile phoneapplications.Furthermore,for languages where a type-based approach does notyield a reasonable call graph (e.g.,JavaScript),points-to analysis remains themost scalable technique for reasoning about indirect function calls [61].6.1 Points-to Analysis DicultiesIn Java-like languages,re ection allows for meta-programming based on stringnames of program constructs like classes or methods.For example,the followingcode re ectively allocates an object of the type named by x:class Factory {Object make(String x) {return Class.forName(x).newInstance();}}Analyzing this code with the assumption that x may name any type yieldsextremely imprecise results,as is most cases only a few types may be allocatedby code like the above.In some cases,tracking string constants owing intox and only considering those types can help.However,many common idiomsmake this dicult or ineective,such as use of string concatenation or readingthe string from a conguration le.Previous work has suggested handling code like the above by exploiting usesof the allocated object [7,39],since the object is often cast to a more specictype before it is used.By tracking data ow of re ectively-created objects tocasts and optimistically assuming that the casts succeed,the set of allocatedtypes can often be narrowed signicantly.For example,consider the followinguse of the previously-shown Factory class:Factory f = new Factory();String widgetClass = properties.get("widgetClassName");IWidget w = (IWidget) f.make(widgetClass);In this case,the analysis treats the re ective code as allocating any subtype ofIWidget.Unfortunately,techniques like the above cannot save points-to analysis fromre ection in general.Sometimes,re ective creations can ow to interfaces likejava.io.Serializable,which is implemented by many types.In other cases,re ection is used without any downcasts,e.g.,re ective method calls via Java'sMethod.invoke().As standard libraries and frameworks have grown,the costof imprecise re ection handling has increased dramatically,since many moretypes and methods may be (imprecisely) deemed reachable.In some cases,cer-tain parts of libraries may be manually excluded based on knowledge of theapplication;e.g.,GUI libraries can be excluded when analyzing a program withno graphical interface (this approach is used when running WALA regressiontests).However,for cases like server applications that are themselves packagedwith many libraries (we have observed more than 75.jar les in some cases),manual exclusions are not suitable.We are unaware of any automatic techniquethat is able to handle re ection across large Java applications with sucientprecision,and others have also observed this problem [58,Section 5.1].6.2 Under-Approximate TechniquesGiven the aforementioned diculties with over-approximate alias analysis,under-approximate techniques present an attractive alternative in cases where soundanalysis is not required (e.g.,in a bug detection tool).When rst exploringthis area,we tried to base our approach on a points-to analysis modied tobe under-approximate,either via reduced re ection handling or use of a par-tial result computed in some time bound.However,we found this approach tobe unsatisfactory:ignoring re ection often led to missed,important behaviors(particularly in framework-based applications [59]),and time bounds requiredcomplex heuristics to ensure that the points-to analysis explored desired parts ofthe program early.Instead,we have turned to an approach of (i) using a variantof the access-path tracking described in Section 5 to track must-alias informationunder-approximately,and (ii) employing domain-specic modeling of re ectionas needed.We describe these techniques in turn.Under-Approximation by using Only Must Information Section 5 de-scribed an access-path analysis based on states of ho;unique;APm;May;APmnituples,designed to achieve high precision while relying on a pre-computed points-to analysis for soundness.In the under-approximate setting,we can dene asimpler analysis with tuples of the form ho;APmi,carrying only must-alias in-formation.As earlier,each time the transformation of must-access-paths set iscomputed,we must limit the size of the resulting set.It is necessary to do so fortwo reasons:(i) in the presence of loops (or recursion),it is possible for accesspaths to grow without a bound (ii) even loop free code might in ate the sets toneedlessly large sizes,compromising eciency.The transformers over ho;APmituples follow the update for APmas described in Table 3.In the following examples,we demonstrate how this abstraction can be usedfor identifying resource leaks (see Section 2),and consider tuples of the formho;R;APmi where R is a resource type.R can be acquired via statement typep = acquire R and released via release R r (where r points to the resource).At a high level,we use the must-alias access-path abstraction to detect resourceleaks as follows (see Torlak and Chandra [67] for full details):{ At statement p = acquire R,the analysis generates tuple hp;R;fpgi (wename resource objects by the variable to which they are rst assigned).{ Must aliases are updated as shown in Table 3.{ If statement release R r is reached with tuple t = hp;R;ai and r 2 a,thenthe analysis kills t.{ If method exit is reached with a tuple hp;R;fgi,then a leak is reported,asno aliases exist to release the resource.{ For additional precision,conditionals performing null checks are interpreted.If a conditional checking v = null is reached with tuple t = hp;R;ai andv 2 a,t is killed on the true branch,as a must-alias for a resource cannot benull.v 6= null is handled similarly.Example 1 Consider the code fragment shown below.We show the facts accu-mulated by our analysis after each statement to the right.p = acquire Rhp;R;fpgiq.f = php;R;fp;q:fgir = q.fhp;R;fr;p;q:fgibranch (r == null) L1T:none,F:hp;R;fr;p;q:fgirelease R rnoneL1:noneAt the branch statement,the analysis concludes that only the fall-throughsuccessor is feasible:r,being a must-alias to a resource,cannot be a null pointer.At the release statement,the analysis uses the APmset to establish that rmust-alias the resource.Consequently,no fact makes it to L1,and no error isreported.Had the analysis not interpreted the branch statement,fact hp;R;fr;p;q:fgiwould have reached L1.Local variables p,q,and r would then be dropped fromthe state,giving the fact hp;R;fgi at the exit.This fact would lead the analysisto conclude that resource p is unreachable at exit,resulting in a false positive.Example 2 Consider the leaky code fragment shown below.It allocates a resourcein a loop,but frees only the last allocated instance.The branch * L2 has a non-deterministic condition which cannot be interpreted by the analysis.p1= nullL1hp3;R;fp3gi,hp3;R;fp2gip2= (p1,p3)hp3;R;fp2;p3gi,hp3;R;fgibranch * L2p3= acquire Rhp3;R;fp3gi,hp3;R;fp2gibranch true L1L2hp3;R;fp2;p3gi,hp3;R;fgirelease R p2hp3;R;fgiThis fragment also illustrates the treatment of  nodes.Consider the pathtaken through the loop two times and then exiting to L2.The analysis generateshp3;R;fp3gi after the acquire.The generated fact ows to L1,and the analysisgenerates hp3;R;fp2;p3gi after the ,using the eect of p2= p3.This,in turn,ows out to L2 via the branch,where it is killed by the release.In the next loop iteration,the acquire statement overwrites p3,so the anal-ysis kills the occurrence of p3in fp2;p3g,generating the new fact hp3;R;fp2gi.After propagation on the back edge,this last fact is transformed by the  state-ment to hp3;R;fgi.Finally,when the transformed fact ows out to L2,it cannotbe killed by release since the must-alias set is empty.The fact reaches methodexit,and a leak is reported.Method calls We have not yet addressed how an under-approximate analysis likethe resource leak detector reasons about method calls;as discussed in Sections 2and 3,call graph construction and alias analysis are inter-dependent.We havefound that an under-approximate call graph based on the class hierarchy issucient for bug-nding tools like the leak detector.In using the class hierarchy,all available code is considered,so certain issues related to insucient re ectionhandling are avoided.For call sites with a very large number of possible targets,a subset is chosen heuristically,with the heuristics tunable by the client analysis.For example,in the leak detector [67],the heuristics were tuned to prefer codeperforming resource allocation.Domain-specic Re ection Modeling In certain cases,key application be-haviors are implemented using re ection,necessitating modeling of those re-ective behaviors.For example,server-side web applications written in Java aretypically built atop Java EE15and other frameworks,and the application code isonly invoked via re ective calls fromthe framework.To eectively detect securityvulnerabilities in such applications using taint analysis [68],the analysis musthave visibility into how these re ective calls invoke application code (e.g.,to seehow untrusted data is passed).For taint analysis of web applications,recent workdescribes Framework for Frameworks (F4F) [59],a system that eases modelingthe security-relevant behaviors of web-application frameworks.We expect sim-ilar modeling to be required in other domains where complex,re ection-heavyframeworks are employed.15http://www.oracle.com/technetwork/java/javaee/index.html7 Conclusions and Future WorkWe have presented a high-level overview of state-of-the-art may- and must-aliasanalyses for object-oriented programs,based on our experiences implementingproduction-quality static analyses for Java.The sound alias-analysis techniquespresented here work well for medium-sized programs,while for large-scale Javaprograms,an under-approximate alias analysis based on access-path trackingcurrently yields the most useful results.We see several potentially fruitful directions for future work,for example:Re ection Improved re ection handling could signicantly increase the eec-tiveness of various alias-analysis techniques.Approaches based on analyzingnon-code artifacts like conguration les [59] or introducing more analyzablelanguage constructs [28] seem particularly promising.Dynamically-Typed Languages As scripting languages like JavaScript gainin popularity,there is an increasing need for eective alias analyses for suchlanguages.Analyzing such languages poses signicant challenges,as use ofre ective code constructs is even more pervasive than in Java,and optimiza-tions based on the type system (see Section 4.3) may no longer be eectivein improving scalability [61].Developer Tool Integration Some initial work has been done on developertools that make signicant use of alias analysis [19,54,63],but we believethere is signicant further scope for tools to help developers reason aboutdata ow and aliasing in their programs.Better tools for reasoning aboutaliasing are particularly important since trends indicate increasing usageof dynamically-typed languages and large frameworks,both of which canobscure aliasing relationships in programs.Bibliography[1] Ole Agesen.The cartesian product algorithm:Simple and precise typeinference of parametric polymorphism.In ECOOP,1995.[2] Alfred V.Aho,Monica S.Lam,Ravi Sethi,and Jerey D.Ullman.Com-pilers:Principles,Techniques,& Tools with Gradiance.Addison-WesleyPublishing Company,USA,2nd edition,2007.[3] Lars O.Andersen.Program Analysis and Specialization for the C Program-ming Language.PhD thesis,University of Copenhagen,DIKU,1994.[4] David Bacon and Peter Sweeney.Fast static analysis of C++ virtual func-tion calls.In Conference on Object-Oriented Programming,Systems,Lan-guages,and Applications (OOPSLA),San Jose,CA,October 1996.[5] Gogul Balakrishnan and Thomas Reps.Recency-abstraction for heap-allocated storage.In IN SAS,pages 221{239,2006.[6] Marc Berndl,Ondrej Lhotak,Feng Qian,Laurie Hendren,and NavindraUmanee.Points-to analysis using BDDs.In Conference on ProgrammingLanguage Design and Implementation (PLDI),June 2003.[7] Martin Bravenboer and Yannis Smaragdakis.Strictly declarative speci-cation of sophisticated points-to analyses.In Proceeding of the 24th ACMSIGPLAN conference on Object oriented programming systems languagesand applications,OOPSLA'09,pages 243{262,New York,NY,USA,2009.ACM.[8] D.R.Chase,M.Wegman,and F.Zadeck.Analysis of pointers and struc-tures.In Conference on Programming Language Design and Implementation(PLDI),pages 296{310,New York,NY,1990.ACM Press.[9] Jong-Deok Choi,Michael Burke,and Paul Carini.Ecient ow-sensitiveinterprocedural computation of pointer-induced aliases and side eects.InPOPL,pages 232{245,1993.[10] Edmund Clarke.Model checking.In S.Ramesh and G Sivakumar,editors,Foundations of Software Technology and Theoretical Computer Science,vol-ume 1346 of Lecture Notes in Computer Science,pages 54{56.SpringerBerlin/Heidelberg,1997.10.1007/BFb0058022.[11] P.Cousot and R.Cousot.Systematic design of program analysis frame-works.In ACM Symposium on Principles of Programming Languages(POPL),pages 269{282,New York,NY,1979.ACM Press.[12] Manuvir Das,Sorin Lerner,and Mark Seigle.ESP:path-sensitive programverication in polynomial time.In Proceedings of the ACM SIGPLAN 2002Conference on Programming language design and implementation,PLDI'02,pages 57{68,New York,NY,USA,2002.ACM.[13] Jerey Dean,David Grove,and Craig Chambers.Optimization of object-oriented programs using static class hierarchy analysis.In European Con-ference on Object-Oriented Programming (ECOOP),August 1995.[14] N.Dor,S.Adams,M.Das,and Z.Yang.Software validation via scalablepath-sensitive value ow analysis.In ISSTA,pages 12{22,2004.[15] Maryam Emami,Rakesh Ghiya,and Laurie J.Hendren.Context-sensitiveinterprocedural points-to analysis in the presence of function pointers.InPLDI'94:Proceedings of the ACMSIGPLAN 1994 conference on Program-ming language design and implementation,pages 242{256,New York,NY,USA,1994.ACM Press.[16] Manuel Fahndrich,Jakob Rehof,and Manuvir Das.Scalable context-sensitive ow analysis using instantiation constraints.In Conference onProgramming Language Design and Implementation (PLDI),2000.[17] Manuel Fandrich,Jerey S.Foster,Zhendong Su,and Alex Aiken.Par-tial online cycle elimination in inclusion constraint graphs.In Conferenceon Programming Language Design and Implementation (PLDI),Montreal,Canada,June 1998.[18] Christian Fecht and Helmut Seidl.Propagating dierences:an ecientnew xpoint algorithm for distributive constraint systems.Nordic J.ofComputing,5(4):304{329,1998.[19] Asger Feldthaus,Todd Millstein,Anders Mller,Max Schafer,and FrankTip.Tool-supported refactoring for JavaScript.In Proceedings of the 2011ACMinternational conference on Object oriented programming systems lan-guages and applications,OOPSLA'11,pages 119{138,New York,NY,USA,2011.ACM.[20] Stephen J.Fink,Eran Yahav,Nurit Dor,G.Ramalingam,and EmmanuelGeay.Eective typestate verication in the presence of aliasing.ACMTransactions on Software Engineering and Methodology,17(2):1{34,2008.[21] David Grove and Craig Chambers.A framework for call graph constructionalgorithms.ACM Trans.Program.Lang.Syst.,23(6):685{746,2001.[22] Samuel Z.Guyer and Calvin Lin.Client-driven pointer analysis.In Inter-national Static Analysis Symposium (SAS),San Diego,CA,June 2003.[23] Ben Hardekopf and Calvin Lin.The ant and the grasshopper:fast andaccurate pointer analysis for millions of lines of code.In PLDI,pages 290{299,2007.[24] Nevin Heintze.Analysis of Large Code Bases:The Compile-Link-AnalyzeModel.Draft of November 12,1999.[25] Nevin Heintze and David McAllester.Linear-time subtransitive control owanalysis.SIGPLAN Not.,32(5):261{272,1997.[26] Nevin Heintze and Olivier Tardieu.Demand-driven pointer analysis.InConference on Programming Language Design and Implementation (PLDI),Snowbird,Utah,June 2001.[27] Nevin Heintze and Olivier Tardieu.Ultra-fast aliasing analysis using CLA:A million lines of C code in a second.In Conference on ProgrammingLanguage Design and Implementation (PLDI),June 2001.[28] Shan Shan Huang and Yannis Smaragdakis.Morphing:Structurally shapinga class by re ecting on others.ACM Trans.Program.Lang.Syst.,33:6:1{6:44,February 2011.[29] Nicholas Kidd,Thomas W.Reps,Julian Dolby,and Mandana Vaziri.Find-ing concurrency-related bugs using randomisolation.STTT,13(6):495{518,2011.[30] William Landi and Barbara G.Ryder.A safe approximate algorithmfor interprocedural aliasing.In PLDI'92:Proceedings of the ACM SIG-PLAN 1992 conference on Programming language design and implementa-tion,pages 235{248,New York,NY,USA,1992.ACM Press.[31] Chris Lattner,Andrew Lenharth,and Vikram Adve.Making context-sensitive points-to analysis with heap cloning practical for the real world.In Proceedings of the 2007 ACM SIGPLAN conference on Programminglanguage design and implementation,PLDI'07,pages 278{289,New York,NY,USA,2007.ACM.[32] Ondrej Lhotak and Laurie Hendren.Scaling Java points-to analysis us-ing Spark.In International Conference on Compiler Construction (CC),Warsaw,Poland,April 2003.[33] Ondrej Lhotak and Laurie Hendren.Jedd:a BDD-based relational extensionof Java.In Conference on Programming Language Design and Implemen-tation (PLDI),2004.[34] Ondrej Lhotak and Laurie Hendren.Evaluating the benets of context-sensitive points-to analysis using a BDD-based implementation.ACMTrans.Softw.Eng.Methodol.,18:3:1{3:53,October 2008.[35] Ondrej Lhotak and Laurie Hendren.Relations as an abstraction for BDD-based program analysis.ACM Trans.Program.Lang.Syst.,30:19:1{19:63,August 2008.[36] Percy Liang and Mayur Naik.Scaling abstraction renement via pruning.In Proceedings of the 32nd ACM SIGPLAN conference on Programminglanguage design and implementation,PLDI'11,pages 590{601,New York,NY,USA,2011.ACM.[37] Percy Liang,Omer Tripp,and Mayur Naik.Learning minimal abstractions.In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium onPrinciples of programming languages,POPL'11,pages 31{42,New York,NY,USA,2011.ACM.[38] Percy Liang,Omer Tripp,Mayur Naik,and Mooly Sagiv.A dynamic evalu-ation of the precision of static heap abstractions.In Proceedings of the ACMinternational conference on Object oriented programming systems languagesand applications,OOPSLA'10,pages 411{427,New York,NY,USA,2010.ACM.[39] V.Benjamin Livshits,John Whaley,and Monica S.Lam.Re ection analysisfor Java.In APLAS,pages 139{160,2005.[40] Alexey Loginov,Eran Yahav,Satish Chandra,Stephen Fink,Noam Rinet-zky,and M.G.Nanda.Verifying dereference safety via expanding-scopeanalysis.In ISSTA'08:International Symposium on Software Testing andAnalysis,2008.[41] Matthew Might,Yannis Smaragdakis,and David Van Horn.Resolving andexploiting the k-CFA paradox:illuminating functional vs.object-orientedprogramanalysis.In Proceedings of the 2010 ACMSIGPLAN conference onProgramming language design and implementation,PLDI'10,pages 305{315,New York,NY,USA,2010.ACM.[42] Ana Milanova,Atanas Rountev,and Barbara G.Ryder.Parameterizedobject sensitivity for points-to analysis for Java.ACM Trans.Softw.Eng.Methodol.,14(1):1{41,2005.[43] Mayur Naik,Alex Aiken,and John Whaley.Eective static race detectionfor Java.In PLDI,pages 308{319,2006.[44] Robert O'Callahan.Generalized Aliasing as a Basis for Program AnalysisTools.PhD thesis,Carnegie Mellon University,November 2000.[45] Jens Palsberg and Michael I.Schwartzbach.Object-oriented type infer-ence.In Conference proceedings on Object-oriented programming systems,languages,and applications,OOPSLA'91,pages 146{161,New York,NY,USA,1991.ACM.[46] David J.Pearce.Some directed graph algorithms and their application topointer analysis.PhD thesis,Imperial College of Science,Technology andMedicine,University of London,2005.[47] David J.Pearce,Paul H.J.Kelly,and Chris Hankin.Online cycle detectionand dierence propagation for pointer analysis.In Proceedings of the thirdinternational IEEE Workshop on Source Code Analysis and Manipulation,2003.[48] Thomas Reps.Solving demand versions of interprocedural analysis prob-lems.In International Conference on Compiler Construction (CC),Edin-burgh,Scotland,April 1994.[49] Thomas Reps.Program analysis via graph reachability.Information andSoftware Technology,40(11-12):701{726,November/December 1998.[50] Thomas Reps.Undecidability of context-sensitive data-independence anal-ysis.ACM Trans.Program.Lang.Syst.,22(1):162{186,2000.[51] Thomas Reps,Susan Horwitz,and Mooly Sagiv.Precise interproceduraldata ow analysis via graph reachability.In ACM Symposium on Principlesof Programming Languages (POPL),1995.[52] Atanas Rountev,Ana Milanova,and Barbara G.Ryder.Points-to analysisfor Java using annotated constraints.In Conference on Object-OrientedProgramming,Systems,Languages,and Applications (OOPSLA),TampaBay,Florida,October 2001.[53] Mooly Sagiv,Thomas Reps,and Reinhard Wilhelm.Parametric shape anal-ysis via 3-valued logic.ACM Trans.Program.Lang.Syst.,24:217{298,May2002.[54] Max Schafer,Manu Sridharan,Julian Dolby,and Frank Tip.RefactoringJava programs for exible locking.In Proceeding of the 33rd InternationalConference on Software Engineering,ICSE'11,pages 71{80,New York,NY,USA,2011.ACM.[55] M.Sharir and A.Pnueli.Two approaches to interprocedural data ow anal-ysis,chapter 7,pages 189{233.Prentice-Hall,1981.[56] O.Shivers.Control ow analysis in Scheme.In Conference on ProgrammingLanguage Design and Implementation (PLDI),1988.[57] Sharon Shoham,Eran Yahav,Stephen Fink,and Marco Pistoia.Staticspecication mining using automata-based abstractions.In Proceedings ofthe 2007 international symposium on Software testing and analysis,ISSTA'07,pages 174{184,New York,NY,USA,2007.ACM.[58] Yannis Smaragdakis,Martin Bravenboer,and Ondrej Lhotak.Pick yourcontexts well:understanding object-sensitivity.In POPL,pages 17{30,2011.[59] Manu Sridharan,Shay Artzi,Marco Pistoia,Salvatore Guarnieri,OmerTripp,and Ryan Berg.F4F:taint analysis of framework-based web applica-tions.In Proceedings of the 2011 ACM international conference on Objectoriented programming systems languages and applications,OOPSLA'11,pages 1053{1068,New York,NY,USA,2011.ACM.[60] Manu Sridharan and Rastislav Bodk.Renement-based context-sensitivepoints-to analysis for Java.In Conference on Programming Language Designand Implementation (PLDI),2006.[61] Manu Sridharan,Julian Dolby,Satish Chandra,Max Schafer,and FrankTip.Correlation tracking for points-to analysis of JavaScript.In ECOOP,2012.[62] Manu Sridharan and Stephen J.Fink.The complexity of Andersen's analy-sis in practice.In Proceedings of the 16th International Symposium on StaticAnalysis,SAS'09,pages 205{221,Berlin,Heidelberg,2009.Springer-Verlag.[63] Manu Sridharan,Stephen J.Fink,and Rastislav Bodik.Thin slicing.InProceedings of the 2007 ACM SIGPLAN conference on Programming lan-guage design and implementation,PLDI'07,pages 112{122,New York,NY,USA,2007.ACM.[64] Manu Sridharan,Denis Gopan,Lexin Shan,and Rastislav Bodk.Demand-driven points-to analysis for Java.In Conference on Object-Oriented Pro-gramming,Systems,Languages,and Applications (OOPSLA),2005.[65] Bjarne Steensgaard.Points-to analysis in almost linear time.In ACMSymposium on Principles of Programming Languages (POPL),1996.[66] Frank Tip and Jens Palsberg.Scalable propagation-based call graph con-struction algorithms.In Conference on Object-Oriented Programming,Sys-tems,Languages,and Applications (OOPSLA),Minneapolis,MN,October2000.[67] Emina Torlak and Satish Chandra.Eective interprocedural resource leakdetection.In Proceedings of the 32nd ACM/IEEE International Conferenceon Software Engineering - Volume 1,ICSE'10,pages 535{544,New York,NY,USA,2010.ACM.[68] Omer Tripp,Marco Pistoia,Stephen J.Fink,Manu Sridharan,and OmriWeisman.TAJ:eective taint analysis of web applications.In PLDI,2009.[69] T.J.Watson Libraries for Analysis (WALA).http://wala.sf.net.[70] John Whaley,Dzintars Avots,Michael Carbin,and Monica S.Lam.UsingDatalog with binary decision diagrams for program analysis.In The ThirdAsian Symposium on Programming Languages and Systems,2005.[71] John Whaley and Monica S.Lam.An ecient inclusion-based points-toanalysis for strictly-typed languages.In International Static Analysis Sym-posium (SAS),Madrid,Spain,September 2002.[72] John Whaley and Monica S.Lam.Cloning-based context-sensitive pointeralias analysis using binary decision diagrams.In Conference on Program-ming Language Design and Implementation (PLDI),2004.[73] John Whaley and Martin Rinard.Compositional pointer and escape anal-ysis for Java programs.In Conference on Object-Oriented Programming,Systems,Languages,and Applications (OOPSLA),November 1999.[74] Robert P.Wilson and Monica S.Lam.Ecient context-sensitive pointeranalysis for C programs.In Conference on Programming Language Designand Implementation (PLDI),1995.[75] Eran Yahav and Stephen Fink.The SAFE experience.In Engineering ofSoftware,pages 17{33.Springer Berlin Heidelberg,2011.[76] Jianwen Zhu and Silvian Calman.Symbolic pointer analysis revisited.InConference on Programming Language Design and Implementation (PLDI),2004.