Analyzers can be pieced together from S 2E-provided analyzers, or can be written from scratch using the S 2E API

Feature

Properties

Scalability

Going from single-path analysis to multi-path analysis turns a linear problem into an exponential one, because the number of paths through a program increases exponentially in the number of branches (path explosion problem)

Able to directly analyze binaries, being able to analyze even proprietary software

Dynamic Binary Instrumentation

Directly interpret x86 machine code

Perform analyses in-vivo within real software stack instead of using abstract models of these layers

Virtualization

Removes the need for the stubs or abstract models required by most state-of-the-art symbolic execution engines and model checkers

e.g.

drivers

kernel

libraries

user programs

Simultaneously analyze entire families of execution paths

Predictive analyses often must reason about entire families of paths through the target system, not just one path

Execution consistency models

Models

Defining a model, we think in terms of which paths it includes vs. excludes

Globally feasible

Execution is additionally consistent with control flow restrictions imposed by data-related constraints in the environment

Subset of the locally feasible paths

Locally feasible

Execution is consistent with both the system’s CFG and with the restrictions on control flow imposed by the data-related constraints within the unit

Subset of the statically feasible paths

Statically feasible

There exists a path in the system’s inter-procedural control flow graph (CFG) corresponding to the execution in question

Strict Consistency

Strictly Consistent System-level Execution

Implementation of SC-SE is limited by the path explosion problem: the number of globally feasible paths is roughly exponential in the size of the whole system

Provided that the engine can solve all constraints it encounters, every path through the unit that is possible under a concrete execution of the system will be eventually found by SC-SE exploration

Exploration engine gathers and uses information about all parts of the system, to explore new paths through the unit.

Strictly Consistent Unit-level Execution

To implement this model, the SSE converts all symbolic data to concrete values when the unit calls the environment

The conversion is consistent with the current set of path constraints in the unit. No other conversion is performed. The environment is treated as a black box, and no symbolic data can flow into it

Strictly Consistent Concrete Execution

New paths can only be explored by blindly guessing new inputs

The only explored paths are the paths that the system follows when executed with the sample input provided by the analysis

The entire system is treated as a black box: no internal information is used to explore new paths

Relaxed Consistency

CFG Consistency

Exploration is fast, because branch feasibility need not be checked with a constraint solver

Implement RC-CC by pursuing all outcomes of every branch, regardless of path constraints, thus following all edges in the unit’s inter-procedural CFG

SSE converts, at the unit/environment boundary, the concrete values generated by the environment into symbolic values that satisfy the constraints of the environment’s API

Consistent model

Every path through the unit admissible by the model, there exists a corresponding globally feasible path through the system (i.e., the system can run concretely in that way)

Complete model

Every path through the unit that corresponds to some globally feasible path through the system will eventually be discovered by exploration done under that model.

Distinction

Which execution paths each model admits

The traditional assumption about system execution is that the state at any point in time is consistent, i.e., there exists a feasible path from the start state to the current state

Making principled peformance/accuracy trade-offs in complex analyses

Selective Symbolic Execution

Transisition

Symbolic → Concrete Transition

S2E employs lazy concretization

This is an important optimization when doing in-vivo symbolic execution,because a lot of data can be carried through the layers of thesoftware stack without conversion

Converts the value of x from symbolic to concrete on-demand, only when concretely running code actually reads the value of x

Concrete → Symbolic Transition

The concrete domain is unaware of libFn being executed in multi-path mode. All paths execute independently, and it is up to the S2Eanalyzer plugins to decide whether, besides observing the concrete path, they also wish to look at the symbolic paths

Simplest conversion is to use an S2E selector to change the concrete arguments into symbolic ones, e.g., instead of libFn(10) call libFn(λ)

Once this transition occurs, S2E executes libFn symbolically and simultaneously executes libFn with the original concrete argument(s) as well

Once exploration of libFn completes,S2E returns to appFn the concrete return value resulting from the concrete execution, but libFn will have been explored symbolically as well.

The execution of app is consistent, while at the same time S2E exposes to the analyzer plugins those paths inlib that are rooted at libFn’s entry point

S2E is based on the key observation that often only some families of paths are of interest

This superposition of paths into a symbolic execution tree, in which each possible execution corresponds to a path from the root of the tree to a leaf corresponding to a terminal state

Program is treated as a superposition of possible execution paths. if (x>0) then else can be viewed as a superposition of two possible paths: one for x>0 and another one for x≤0

Automatically minimize amount of code that has to be executed symbolically given a target analysis

Ability to scale large real system, e.g. full Windows stack

System Analysis with S2E

Selection interface

Used to guide the exploration of execution paths

First step in using S2E is deciding on a policy for which part of a program to execute in multi-path (symbolic) mode vs. single-path (concrete) mode

Data-based selection

Provides a way to expand an execution path into a multi-path execution by introducing symbolic values into the system

Any time S2E encounters a branch predicate involving a symbolic value, it will fork the execution accordingly.

Code-based selection

Enables/disables multi-path execution depending on whether the program counter is or not within a target code area

Priority-based selection

Used to define the order in which paths are explored within the family of paths defined with data-based and code-based selectors

Analysis interface

Used to collect events or check properties of execution paths

Once the selectors define a family of paths,S2E executes these paths and exposes each one of them to the analyzer plugins

Analyzer class

Bug finder

DataRaceDetector

MemoryChecker

Look for the corresponding bug conditions and output an execution path leading to7the encountered bug

ExecutionTracer

Selectively records the instructions executed along a path ,along with the memory accesses, register values, and hardware I/O