Trace Partitioning in Abstract Interpretation Based Static Analyzers

16
pages

Niveau: Supérieur, Doctorat, Bac+8Trace Partitioning in Abstract Interpretation Based Static Analyzers Laurent Mauborgne and Xavier Rival DI, Ecole Normale Superieure, 45 rue d'Ulm, 75 230 Paris cedex 05, France Emails: and Abstract. When designing a tractable static analysis, one usually needs to approximate the trace semantics. This paper proposes a systematic way of regaining some knowledge about the traces by performing the abstraction over a partition of the set of traces instead of the set it- self. This systematic refinement is not only theoretical but tractable: we give automatic procedures to build pertinent partitions of the traces and show the efficiency on an implementation integrated in the Astree static analyzer, a tool capable of dealing with industrial-size software. 1 Introduction Usually, concrete program executions can be described with traces; yet, most static analyses abstract them and focus on proving properties of the set of reach- able states. For instance, checking the absence of runtime errors in C programs can be done by computing an over-approximation of the reachable states of the program and then checking that none of these states is erroneous. When com- puting a set of reachable states, any information about the execution order and the concrete flow paths is lost. However, this reachable states abstraction might lead to too harsh an approx- imation of the program behavior, resulting in a failure of the analyzer to prove the desired property.

Trace Partitioning in Abstract InterpretationBased Static AnalyzersLaurent Mauborgne and Xavier RivalDI, Ecole Normale Superieure, 45 rue d’Ulm, 75 230 Paris cedex 05, FranceEmails: Laurent.Mauborgne@ens.fr and Xavier.Rival@ens.frAbstract. When designing a tractable static analysis, one usually needsto approximate the trace semantics. This paper proposes a systematicway of regaining some knowledge about the traces by performing theabstraction over a partition of the set of traces instead of the set it-self. This systematic re nemen t is not only theoretical but tractable: wegive automatic procedures to build pertinent partitions of the traces andshow the e ciency on an implementation integrated in the Astree staticanalyzer, a tool capable of dealing with industrial-size software.1 IntroductionUsually, concrete program executions can be described with traces; yet, moststatic analyses abstract them and focus on proving properties of the set of reach-able states. For instance, checking the absence of runtime errors in C programscan be done by computing an over-approximation of the reachable states of theprogram and then checking that none of these states is erroneous. When com-puting a set of reachable states, any information about the execution order andthe concrete o w paths is lost.However, this reachable states abstraction might lead to too harsh an approx-imation of the program behavior, resulting in a failure of the analyzer to provethe desired property. For instance, let us consider the following program:if(x < 0)f sgn = 1;gelsef sgn = 1;gClearly sgn is either equal to 1 or 1 at the end of this piece of code; in particularsgn cannot be equal to 0. As a consequence, dividing by sgn is safe. However,a simple interval analysis [7] would not discover it, since the lub (least upperbound) of the intervals [ 1; 1] and [1; 1] is the interval [ 1; 1] and 02 [ 1; 1].A simple x would be to use a more expressive abstract domain. For instance,the disjunctive completion [8] of the interval domain would allow the propertyto be proved: an abstract value would be a nite union of intervals; hence,the analysis would report x to be in [ 1; 1][ [1; 1] at the end of the aboveprogram. Yet, the cost of disjunctive completion is prohibitive. Other domainscould be considered as an alternative to disjunctive completion; yet, they mayalso be costly in practice and their design may be involved. For instance, commonrelational domains like octagons [15] or polyhedra [10] would not help here, sincethey describe convex sets of values, so the abstract union operator is an impreciseover-approximation of the concrete union. A reduced product of the domain ofintervals with a congruence domain [12] succeeds in proving the property, since1 and 1 are both in f1 + 2 k j k 2 Ng. However, a more intuitive way tosolve the di cult y would be to relate the value of sgn to the way it is computed.Indeed, if the true branch of the conditional was executed, then sgn = 1;otherwise, sgn = 1. This amounts to keeping some disjunctions based on controlcriteria. Each element of the disjunction is related to some property about thehistory of concrete computations, such as \which branch of the conditional wastaken". This approach was rst suggested by [16]; yet, it was presented in arather limited framework and no implementation result was provided. The sameidea was already present in the context of data- o w analysis in [13] where thehistory of computation is traced using an automaton chosen before the analysis.Choosing of the relevant partitioning (which explicit disjunctions to keepduring the static analysis) is a rather di cult and crucial point. In practice,it can be necessary to make this choice at analysis time. Another possibilitypresented in [1] is to use pro ling to determine the partitions, but this approachis relevant in optimization problems only.The contribution of the paper is both theoretical and practical:{ We introduce a theoretical framework for trace partitioning, that can beinstantiated in a broad series of cases. More partitioning con gurations aresupported than in [16] and the framework also supports dynamic partitioning(choice of the partitions during the abstract computation);{ We provide detailed practical information about the use of the trace parti-tioning domain. First, we describe the implementation of the domain; second,we review some strategies for partition creation during the analysis.All the results presented in the paper are supported by the experience of thedesign, implementation and practical use of the Astree static analyzer [2, 14].This analyzer aims at certifying the absence of run-time errors (and user-de nednon-desirable behaviors) in very large synchronous embedded applications suchas avionics software. Trace partitioning turned out to be a very important toolto reach that goal; yet, this technique is not speci c to the families of softwareaddressed here and can be applied to almost any kind of software.In Sect. 2, we set up a general theoretical framework for trace partitioning.The main choices for the implementation of the partitioning domain are evokedin Sect. 3; we discuss strategies for partitioning together with some practicalexamples in Sect. 4. Finally, we conclude in Sect. 5.2 Theoretical FrameworkThis section supposes basic knowledge of the abstract interpretation framework[5]. For an introduction, the reader is referred to [9].2.1 De nitionsPrograms: We de ne a program P as a transition system (S;!;S ) where Sis the set of states of the program; ! is the relation describing thepossible execution elementary steps andS denotes the set of initial states.?Traces: We writeS for the set of all nite non-empty sequences of states. If this a nite sequence of states, will denote the (i+1) state of the sequence, i 0the rst state and the last state. We de ne & () as the set of all the statesaSdefin . We extend this notation to sets of sequences: & () = & ().2If is a pre x of , we write . A trace of the program P is de neddef ?as an element of JPK = f2S j 2S ^8i; ! g. Note that the set0 i i+1JPK is pre x-closed. An execution of the program is a possibly in nite sequencestarting from an initial state and such that there is no possible transition fromthe nal state, if any. Executions are represented by the set of their pre xes,thus avoiding the need to deal with in nite sequences.2.2 Reachability AnalysisIn order to prove safety properties about programs, one needs to approximatethe set of reachable states of the programs. This is usually done in one step by the]design of an abstract domain D representing sets of states and a concretizationfunction that maps a representation of a set of states to the set of all tracescontaining these states only. In order to be able to re ne that abstraction, wedecompose it in two steps. The rst step is the reachability abstraction, thesecond one the set of states abstraction.We start from the most precise description of the behaviors of program P,given by the concrete semantics JPK of P, i.e the set of nite traces of P, so thedef? ?concrete domain is de ned asP (S ) =fS j is pre x-closed g.Reachability Abstraction: The set of reachable states of can be de ned bydef defthe abstraction () =f j 2 g. Considering the concretization (T) =aR RR? ?f2S j8i; 2 Tg, we get a Galois connection P (S ) P(S) . This Ga-i Rlois connection will allow us to describe the relative precision of the re nemen tsde ned in the sequel of this section.Set of States Abstraction: In the rest of the section, we will assume an] 1abstract domain D representing sets of states and a concretization function] : D !P(S). Basically, (I) represents the biggest set of states safely ap-proximated by the (local) abstract invariant I. The goal of this abstraction is tocompute an approximation of the set of states e ectiv ely.1 Abstract domains don’t necessarily come with an abstraction function.o/o/2.3 Trace DiscriminationDe nition 1 (Covering). A function : E!P(F) is said to be a covering ofSF if and only if ((x)) = F.x2EDe nition 2 (Partition). A function : E!P(F) is said to be a partitionof F if and only if is a covering of F and 8x; y2 E; x = y) (x)\ (y) =;.Trace Discriminating Reachability Domain: Using a well-chosen function? of E!P(S ), one can keep more information about the traces. We de nethe trace discriminating reachability domain D as the set of functions fromRE toP(S), ordered pointwise. The trace discriminating reachability abstractiondef ? is : P (S )! D , ()(x) = f j 2 \ (x)g. The concretization is aR R R then (f) = f j8 ;8x; 2 (x)) 2 f(x)g (( ; ) form a GaloisaR R Rconnection).Comparing Trace Discriminating and Standard Reachability: Follow-ing [8], we compare the abstractions using the associated upper closure operators(the closure operator associated to an abstraction ; is ). The simple reach-ability upper closure maps any set of traces to the setf j8i;9 2 ; = gi aof traces composed of states in . Thus, in order to give a better approximation,the new upper closure must not map any to a set containing a state whichwas not in . If is not a covering, then there is a sequence which is not inS (x), and by de nition of , that sequence can be in any (f), so it isR Rx2Every likely that D is not as precise as the simple reachability domain. On theRS? other hand, if (x) =S , is always at least as precise as .R R R Rx2E?A function : E!P(S ) can distinguish a set of traces from a set 1 2if there exists x in E such that (x) and \ (x) = ;. The following1 2theorem states that, if the covering can distinguish at least two executionswith a state in common, then the abstraction based on is more precise thanstandard reachability. Moreover, the based on is always at least asprecise as the standard reachability abstraction.? Theorem 1. Let be a covering of S . Then, (D ; ) is a more precise ab-R R? ?straction ofP (S ) than (S; ). Moreover, if there are two elements ofP (S ) R which share a state and are distinguished by , then the abstraction (D ; ) ofR R?P (S ) is strictly more precise than (S; ). R Proof. By de nition, () is the set of traces such that8 ;8x, ( 2R R(x))92 \(x); = ).92 \(x); = implies92 ; = .a a a a i aIf is a covering, then for all , there is at least one x such that 2 (x). So ? , meaning that the abstraction (D ; ) of P (S ) is moreR R R R R Rprecise than (S; ).RTo prove that we have a strictly more precise abstraction, we exhibit a set of traces such that () is strictly smaller than (). Following theR R R Rhypothesis, let , , s and x be such that s is a state in & ( )\ & ( ), and1 2 1 26 (x) and \(x) =;. Let be a sequence of such that = s (this is1 2 1 a?always possible because is an element ofP (S ), and as such pre x-closed).1 ?Let = (& ((x)) fsg) [ . Then & () & (), so is in (). But2 R Rwhatever 2 \ (x), does not contain s, so it cannot end with s, hence 62 (). tuR R? ?Corollary 1. If is a non trivial partition of S (no (x) is S ), then the ?abstraction (D ; ) of P (S ) is strictly more precise than (S; ).R R RProof. Suppose that for an x, 8s2 & ((x)),8y = x; s62 & ((y)). Then, because is a covering, all sequences containing a state of (x) is in (x), which means? ?(x) = (& ((x))) . Since is a non trivial partition ofS not all (x) can be ofthis form. So there is an x and a y such that (x) distinguishes between (x)and (y) having a state in common. tuIn practice so far, only partitions will be considered, so the results of Theorem 1apply.2.4 Some Trace Partitioning AbstractionsIn this paragraph, we instantiate the framework to various kinds of partitions. Inthis instantiation we suppose a state can be decomposed into a control state inLand a memory state inM. ThusS =LM. We also assume that the abstract]domain D forgets about the control state, just keeping an approximation of thememory states.We illustrate some partitions with a simple abstract program containing aconditional on Fig 1.?Final Control State Partition: Let : L!P (S ) be the partition ofL def? ?S based on the nal control state: (l) = f2S j9 ; = (l; )g. ThisL apartition is very common and usually done silently when designing the abstractdef] ] ]semantics. It leads to the abstraction (D ; ) of D, where D = L ! D andl ldef ?(I) =f2P (S ) j8i; = (l ; )^ 2 (I(l ))g. i i i i iControl Flow Based Partition: In [16], Tzolovski and Handjieva introducedtrace-based partitioning using control o w. To simplify, they proposed to extendthe control states with an history of the control o w in the form of lists of tagst or f (meaning that the test number i was true or false). Then, they performi ia nal control state partition on this new set of control states. In order to keepthe set of control states nite, they associate with each while loop an integerlimiting the number of t to be considered.iFormally, letBL be the set of control points introducing a branching (e.g.def 0conditionals, while loops...). We de ne C =f(b; l)2BLj9 ; 2M; (b; )!0(l; )g as the set of possible branch choices in the program. Note that in a branchchoice (b; l), l is necessarily directly accessible from b. In order to de ne the trace6partition used in [16], we de ne the control o w abstraction of a trace as the?sequence cf() C made of the maximal sequence of branch choices takenin the trace. Then, the control o w based partition is de ned as the partitiondef? ? :LC !P(S ), (l; ) =f2 (l) j cf() = g.cf cf LIn order to keep the partition nite, [16] limits the number of partitions perbranching control points. They use a parameter : B! N in the abstractionfunction. The -limiting abstraction is de ned as () which is the subsequenceof obtained by deleting the branching choices