Haifa Verification Conference 2008

October 27-30, 2008Organized by IBM Haifa Research Lab

Keynote Talk: Hazards of Verification

Daniel Jackson, MIT

Verification technology has taken great strides recently, but there are hazards in the use of this technology -- in particular by encouraging unwarranted confidence. I'll explain where some of these hazards lie, and some approaches that address them.

For the verification of complex designs, one often needs to encode verification conditions containing integer non-linear constraints. Due to the undecidability of the problem, one usually considers bounded integers and then either linearizes the formula into a SMT(LIA) problem (i.e., theory of linear integer arithmetic with Boolean constraints) or bit-blasts into a SAT problem. We present a novel way of linearizing those constraints and then show how this encoding to SMT(LIA) can be integrated into an incremental lazy bounding and refinement procedure (LBR) that leverages on the success of state-of-the-art SMT(LIA) solvers. The most important feature of our LBR procedure is that the formula does not need to be re-encoded at every step of the procedure; only bounds on variables need to be asserted/retracted, which are very efficiently supported by the recent SMT(LIA) solvers. In a series of controlled experiments, we show the effectiveness of our linearization encoding and LBR procedure in reducing the SMT solve time. We observe similar effectiveness of the LBR procedure when used in a software verification framework applied on industry benchmarks.

Synthesizing Test Models from Test Cases

In this paper we describe a methodology for synthesizing test models from test cases. The context of our approach is model-based graphical user interface (GUI) testing of smartphone applications. To facilitate the deployment of model-based testing practices, existing assets in test automation should be utilized. While companies are interested in the benefits of new approaches, they may have already invested heavily in conventional test suites. The approach represented in this paper enables using such suites for creating complex test models that should have better defect detection capability. The synthesis is illustrated with examples from two small case studies conducted using real test cases from industry. Our approach is semi-automatic requiring user interaction. We also outline planned tool support to enable efficient synthesis process.

A Meta Heuristic for Effectively Detecting Concurrency Errors

Neha Rungta and Eric Mercer

Mainstream programming is migrating to concurrent architectures to improve performance and facilitate more complex computation. The state-of-the-art static analysis tools for detecting concurrency errors are imprecise, generate a large number of false error warnings, and require manual verification of each warning. In this paper we present a meta-heuristic to help reduce the manual effort required in the verification of warnings generated by static analysis tools. We manually generate a small sequence of program locations that represent points of interest in checking the feasibility of a particular static analysis warning; then we use a meta heuristic to automatically control scheduling decisions while guiding the program, in a greedy depth-first search, along the input sequence to test the feasibility of the warning. The meta-heuristic automatically controls thread schedules based on a two-tier ranking system that first considers the number of program locations already observed from the input sequence and the perceived closeness to the next location in the input sequence. The error traces generated by using this technique are real and require no further manual verification. We show the effectiveness of our approach by detecting feasible concurrency errors in benchmarked concurrent programs and the JDK 1.4 concurrent libraries based on warnings generated by the Jlint static analysis tool.

Evaluating Workloads Using Comparative Functional Coverage

Yoram Adler, Dale Blue, Thomas Conti, Richard Prewitt and Shmuel Ur

In this paper we introduce comparative functional coverage - a technique for comparing the coverage of multiple workloads - and the tool it was implemented in. Both the need to compare workloads, as well as using functional coverage as a technique to explore data are not new. However this "long felt need" for comparing workloads using functional coverage was not addressed. We show how to augment the functional coverage tool to handle multiple data sources, how to present the data, and an experiment that shows its usefulness.

Iterative Delta Debugging

Cyrille Valentin Artho

Automated debugging attempts to locate the reason for a failure. Delta debugging minimizes the difference between two inputs, where one input is processed correctly while the other input causes a failure, using a series of test runs to determine the outcome of applied changes. Delta debugging is applicable to inputs or to the program itself, as long as a correct version of the program exists. However, complex errors are often masked by other program defects, making it impossible to obtain a correct version of the program through delta debugging in such cases. Iterative delta debugging extends delta debugging and removes series of defects step by step, until the final unresolved defect alone is isolated. The method is fully automated and managed to localize a bug in some real-life examples.

Statistical Model Checking of Mixed-Analog Circuits with an Application to a Third Order Delta- Sigma Modulator

Edmund Clarke, Alexandre Donzé, and Axel Legay

In this paper, we consider verifying properties of mixed-signal circuits, i.e., circuits for which there is an interaction between analog (continuous) and digital (discrete) quantities. We follow a statistical model checking approach that consists of evaluating the property on a representative subset of behaviors, generated by simulation, and answering the question of whether the circuit satisfies the property with a probability greater than or equal to some value. The answer is correct up to a certain probability of error, which can be pre-specified. The method automatically determines the minimal number of simulations needed to achieve the desired accuracy, thus providing a convenient way to control the trade-off between precision and computational cost. We propose a logic adapted to the specification of properties of mixed-signal circuits, in the temporal domain as well as in the frequency domain. Our logic is unique in that it allows us to compare the Fourier transform of two signals. We also demonstrate the applicability of the method on a model of a third-order Delta-Sigma modulator for which previous formal verification attempts were too conservative and required excessive computation time.

Tool Presentation: SeeCode - A Code Review Plug-in for Eclipse

Moran Shochat, Orna Raz and Eitan Farchi

It is well known that code reviews are among the most effective techniques for finding bugs [2, 3, 4]. In this paper, we describe a code review tool, SeeCode, which supports the code review process. SeeCode is an Eclipse plug-in and thus naturally integrates into the developer's working environment. It supports a distributed review environment and the various roles used in a review meeting. Reviewers can review the code at the same time, through either a virtual or a face-to-face meeting, or at different times. Review comments and author navigation through the code are visible to all reviewers. Review comments are associated with line numbers, and the association is maintained when the code is changed by the developer. Integration with the Eclipse [8] Integrated Development Environment (IDE) enables easy code navigation, which is especially required when object-oriented code is reviewed. SeeCode also supports a quantitative feedback mechanism that reports the effectiveness of the ongoing review effort. This feedback is updated as the review progresses, and can be utilized by the review moderator to keep the review process on track. SeeCode has been piloted by several IBM groups with good feedback. The distributed review feature and integration with the IDE are especially highlighted by users as key features.

Tool Presentation: Progress in Automated Software Defect Prediction

Elaine Weyuker and Thomas Ostrand

We have designed and implemented a tool that predicts files most likely to have defects in a future release of a large software system. The tool builds a statistical regression model based on the version and defect history of the system, and produces a list of the next release's most probable fault-prone files, sorted in decreasing order of the number of predicted defects. Testers can use this information to decide where to focus the most resources, and to help determine how much effort to allocate to various parts of the system. Developers can use the tool's output to help decide whether files should be rewritten rather than patched. A prototype version of the tool has been integrated with AT&T's internal software change management system, providing the tool with seamless access to the system's version and defect information, and giving users a simple interface to the tool's output.

This paper presents a novel technique for counterexample generation in probabilistic model checking of Markov Chains and Markov Decision Processes. (Finite) paths in counterexamples are grouped together in witnesses that are likely to provide similar debugging information to the user. We list five properties that witnesses should satisfy in order to be useful as debugging aids: similarity, accuracy, originality, significance, and finiteness. Our witnesses contain paths that behave similarly outside strongly connected components. We then show how to compute these witnesses by reducing the problem of generating counterexamples for general properties over Markov Decision Processes, in several steps, to the easy problem of generating counterexamples for reachability properties over acyclic Markov Chains.

A Uniform Approach to Three-Valued Semantics for µ-Calculus on Abstractions of Hybrid Automata

Kerstin Bauer, Raffaella Gentilini and Klaus Schneider

Abstraction/refinement methods play a central role in the analysis of hybrid automata that are rarely decidable. Due to the introduction of unrealistic behaviors, soundness (of evaluated properties) is a major challenge for these methods. In this paper, we consider the definition of a three-valued semantics for ?-calculus on abstractions of hybrid automata. Our approach relies on two steps: First, we develop a framework that is general in the sense that it provides a preservation result that holds for several possible semantics of the modal operators. In a second step, we instantiate our framework to two particular abstractions. To this end, a key issue is the consideration of both over- and under-approximated reachability analysis, while classic simulation-based abstractions for hybrid automata rely only on over-approximations, and limit the preservation to the universal (?-calculus') fragment. To specialize our general result, we consider (1) modal abstractions, based on the approximation of hybrid dynamics via may/must transitions, and (2) so-called discrete bounded bisimulations, suitably designed for the abstraction of hybrid automata.

HVC Award Winner: Proofs, Interpolants, and Relevance Heuristics

Ken McMillan

The technique of Craig interpolation provides a means of extracting from a proof about a bounded execution of a system the necessary information to construct a proof about unbounded executions. This allows us to exploit the relevance heuristics that are built into modern SAT solvers and SAT-based decision procedures to scale model checking to larger systems.

This talk will cover applications of Craig interpolation in various domains, including hardware verification using propositional logic, and software verification using first-order logic.

Invited Talk: Is Verification Getting Too Complex?

Yoav Hollander, Cadence

Verification of HW and HW/SW systems is becoming more and more complex. This presentation will look at the reasons for the increased complexity and what can be done about it (sneak preview: quite a lot).

The semiconductor industry relies on coverage metrics as its primary means of gauging both quality and readiness of a chip for production, and yet the metrics in use today measure neither quality nor provide an objective measure of completeness. This talk will explore the problems with existing metrics and why they are not proper measures of verification. Mutation analysis looks like a promising technology to help bridge the divide between what we have and what we need in terms of metrics and may also be able to help bridge the divide between static and dynamic verification. The talk will conclude with some of the remaining challenges that have to be overcome, such as its correct fit within a verification methodology and the standardization of a fault model.

Automatic Boosting of Cross-Product Coverage Using Bayesian Networks

Dorit Baras, Avi Ziv, and Laurent Fournier

Closing the feedback loop from coverage data to the stimuli generator is one of the main challenges in the verification process. Typically, verification engineers with deep domain knowledge manually prepare a set of stimuli generation directives for that purpose. Bayesian networks based coverage directed generation (CDG) systems have been successfully used to assist the process by automatically closing this feedback loop. However, constructing these CDG systems requires manual effort and a certain amount of domain knowledge from a machine learning specialist. We propose a new method that boosts coverage at early stages of the verification process with minimal effort, namely a fully automatic construction of a CDG system that requires no domain knowledge. Experimental results on a real-life cross-product coverage model demonstrate the efficiency of the proposed method.

70% of the entire design effort for System-on-Chip (SoC) designs is spent on functional verification to verify that the design fulfills the specification. Whereas the simulation-based testcase execution is done rather fast, finding the "right"' testcases manually to check that the design corresponds to the usually informal specification document is a time-intensive and error-prone process. In this work, we present a novel methodology to reduce the time for functional verification significantly by automatically generating testcases from the specification document. The specification is expressed as a set of semi-formal textual use cases, which is a widely accepted and easy-to-use document-based description format for requirements specification. During a simulation-based verification we use advanced verification mechanisms such as the random-constrained verification technique to select the generated testcases randomly and apply them on the System-under-Verification. We use a sample RFID SoC to demonstrate the benefits of our methodology. We show that it significantly reduces the time for functional verification, removes errors in the specification and detects a number of discrepancies between the RFID SoC and the RFID protocol specification.

Keynote Talk: Automata-Theoretic Model Checking Revisited

Moshe Vardi, Rice University

In automata-theoretic model checking we compose the design under verification with a Buechi automaton that accepts traces violating the specification. We then use graph algorithms to search for a counterexample trace. The theory of this approach originated in the 1980s, and the basic algorithms were developed during the 1990s.Both explicit and symbolic implementations, such as SPIN and SMV, are widely used.

A Framework for Inherent Vacuity

Orna Kupferman, Dana Fisman, Moshe Vardi and Sarai Sheinvald

Vacuity checking is traditionally performed after model checking has terminated successfully. It ensures that all the elements of the specification have played a role in its satisfaction by the design. Vacuity checking gets both design and specification as input, and is based on an in-depth investigation of the relation between them. Vacuity checking has been proven to be very useful in detecting errors in the modeling of the design or the specification. The need to check the quality of specifications is even more acute in property-based design, where the specification is the only input, serving as a basis for the development of the system. Current work on property assurance suggests various sanity checks, mostly based on satisfiability, non-validity, and realizability, but lacks a general framework for reasoning about the quality of specifications.

Contemporary Post-Silicon Verification Mechanisms

Warren Hunt, UT Austin

We present a general description of the post-silicon verification problem. We differentiate the post-silicon verification problem from pre-silicon and manufacturing test verifications. We describe the in-use mechanisms for performing post-silicon verification, such as internal tracing and external interfaces to record internal events. We demonstrate the scale at which this problem needs to be addressed for commercial microprocessors by using the Centaur 64-bit, X86-compatible microprocessor as an example.

A Mechanized Framework for Applying Formal Analysis in Post-silicon Verification

Warren Hunt, UT Austin

We present an framework to facilitate post-silicon verification of hardware designs. We make use of execution trace monitors to transfer pre-silicon verification results to the post-silicon verification problem. Our framework makes it possible to apply formal reasoning techniques, namely theorem proving, bounded model checking, and SAT solving, to bear upon post-silicon verification problem. The framework also makes explicit the bottlenecks induced by limited observability at the post-silicon stage, and the trade-offs between the quality of logical guarantee and the overhead of additional hardware.

The framework is being mechanized in the ACL2 theorem prover and we illustrate its use in analysis of a cache system.

Application of Formal Technology in Post Silicon Verification and Debugging

Jamil Mazzawi, Jasper

The exponential design growth and complexity of modern SoC and chip design products is a major concern in the electronic design and manufacturing market. Verification of these complex systems is always one of the most daunting and costly challenges. Pre-silicon verification is traditionally the lionís share of the verification effort; however, it is not always effective enough and complete. Therefore, unfortunately, post-silicon bugs sometimes slip to silicon products and impact the time to market. Furthermore, one design has various manufacturing configurations; therefore, verifying each post-silicon configuration is time- and effort-consuming. Formal verification methods are traditionally applied in the pre-silicon verification phase and are effective in complementing dynamic verification. Therefore, it begs the questions, can formal verification also be applied in post-silicon verification and the debugging phase with a similar impact? What are the challenges? and what methodologies are needed to adopt formal in the post-silicon phase? In this talk, we address these questions, outline the ingredients of a successful application of formal, and showcase two real-life examples using a state-of-the-art formal verification system. We demonstrate the impact of formal where it was relatively easy to find several logic bugs much faster than other simulation methods, delineate the logic area where a logic fix was needed, and prove the fix to increase the confidence in the design.

Multi-threading Post Silicon Exerciser - ThreadMill

Amir Nahir, IBM HRL

Pre-silicon functional verification techniques provide excellent dynamic observability, and static analysis of design models can dramatically increase coverage in limited areas. Yet, only a tiny fraction of the huge reachable state-space can be sampled and verified. Although substantial effort is invested in controlling and intelligently directing the verification resources, state-of-the-art pre-silicon techniques cannot cope with the increasing complexity of modern high-end designs. More bugs escape into the silicon. Should we reset our quality expectation from the pre-silicon design? Can we better exploit silicon-casting samples as yet another verification platform? Which pre-silicon verification methodologies can be adapted to silicon?

In this talk we will discuss trade-offs between the different platforms and will point out opportunities to bridge methodologies. We will describe an architecture of a pre-silicon test generator that can be successfully adapted to silicon exercisers, to enable systematic implementation of functional-coverage oriented verification plans.

We will discuss several years' experience with commercial HL-to-RTL equivalence checking with the Hector technology. We will also discuss several considerations based upon the reality that our company is an EDA vendor. This is quite different from the position of a semiconductor company, which can concentrate on a very specific methodology and design type.

Our observations will include some case studies from customers about the methodology and designs on which they are using Hector. Most of the development of Hector was based upon solutions to problems presented by our customers. We will also discuss the general architecture of Hector and some technological information about the engines that underlie Hector.

Model checking is known to be computationally hard, meaning that no single algorithm can efficiently solve all problems. A possible approach is to run many algorithms in parallel until one of them finds a solution. This approach is sometimes called state-of-the-art (SOTA) model checker. However, hardware resources are often limited, forcing some selection. In this paper we present an automatic decision system, called Whisperer, which generates an optimized set of configured algorithms for a given model-checking problem. The system weights the advice of advisors, each predicting the fitness of a different algorithm for the problem. Advisors also monitor the progress of currently running algorithms, allowing the replacement of ineffective algorithms. Whisperer is built into the formal verification platform, RuleBase/PE, and allows novice users to skip the delicate task of algorithm selection. Our experiments show that Whisperer, after some training, performs nearly as well as SOTA.

We present a tool D-TSR for parallelizing an SMT-based BMC over-distributed environment, targeted for checking safety properties in low-level embedded (sequential) software. We use a tunneling and slicing-based reduction (TSR) approach combined with verification-friendly model transformation to decompose a BMC instance disjunctively (at a given depth) into simpler and independent sub-problems. We exploit such a decomposition to cut down the communication cost and the idle time of the CPUs during synchronization while solving BMC instances. Our approach scales almost linearly with the number of CPUs, as demonstrated in our experimental results.

Linear-time Reductions of Resolution Proofs

DPLL-based SAT solvers progress by implicitly applying binary resolution. The resolution proofs that they generate are used, after the SAT solver's run has terminated, for various purposes. Most notable uses in formal verification are: extracting an unsatisfiable core, extracting an interpolant, and detecting clauses that can be reused in an incremental satisfiability setting (the latter uses the proof only implicitly, during the run of the SAT solver). Making the resolution proof smaller can benefit all of these goals. We suggest two methods that are linear in the size of the proof for doing so. Our first technique, called Recycle-units, uses each learned constant (unit clause) x for simplifying resolution steps in which x was the pivot, prior to when it was learned. Our second technique, called Recycle-pivots, simplifies proofs in which there are several nodes in the resolution graph, one of which dominates the others, that correspond to the same pivot. Our experiments with industrial instances show that these simplifications reduce the core by ~5% and the proof by ~13%. It reduces the core less than competing methods such as run-till-fix, but whereas our algorithms are linear in the size of the proof, the latter and other competing techniques are all exponential as they are based on SAT runs. If we consider the size of the proof graph as being polynomial in the number of variables (it is not necessarily the case in general), this gives our method an exponential time reduction compared to existing tools for small core extraction. Our experiments show that this result is evident in practice more so for the second method: it rarely takes more than a few seconds, even when competing tools time out, and hence it can be used as a cheap post-processing procedure for proofs.

Structural Contradictions

Cindy Eisner and Dana Fisman

We study the relation between logical contradictions such as (p AND (not p)) and structural contradictions such as (p intersects (p concatenated with p). Intuitively, we expect logical and structural contradictions to be treated similarly, but it turns out that this does not hold for PSL (or for SVA). We examine the problem, and provide a solution in which logical and structural contradictions are treated in a consistent manner. The solution reveals that not all structural contradictions are created equal: we need to distinguish between them to preserve important characteristics of the logic. A happy result of our solution is that it provides the semantics over the natural alphabet of all subsets of P as opposed to the current semantics of PSL/SVA that give the semantics over an inflated alphabet including the cryptic letters "top" and "bottom". We show that the complexity of model checking is not affected by the changes with respect to the existing semantics of PSL.