Improvement of Fundamentals

The dual areas of Verification and Synthesis share common
foundations in how logic functions are stored and manipulated on the
computer. Logic functions and relations can be stored in one of
basically four different ways: as a circuit, as a binary decision
diagram, as a truth table, or as set of satisfiability
clauses. Improvements in these capabilities enable everything else,
synthesis, optimization, model checking, equivalence checking, bug
finding, etc. It is therefore a fundamental tenet of the Berkeley
Verification and Synthesis Research Center (BVSRC), that work on
improving these capabilities should be a continuous and ongoing
research and development effort. Improvements come in the form of
improved memory footprints, smaller cache misses, exploring alternate
methods, such as clause-based versus circuit-based satisfiability
(SAT) solvers, and specializing methods to a particular applications,
to name a few. The improvements can combine to make an algorithm
sometimes orders of magnitude faster, thus enabling its use in an
application or enabling its iteration many times.

1. AIG Packages

And-Inverter Graphs (AIGs) are networks of two-input AND gates and
inverters. In ABC, AIGs are used as the data-structure of choice for
solving almost all problems in synthesis as well as verification. AIGs
as implemented in ABC are quite different from two-input gate
networks, such as those produced in SIS by command “tech_decomp
-a 2 -o 2”. Several key differences from other logic
representations are:

Inverters are not separate nodes but complemented attributes on
the edges

Uniform representation of both ANDs and ORs (as ANDs) using the
de Morgan rule

In applications with large AIGs, the performance of AIG-based code
in several algorithms greatly depends on the amount of memory
used. For example, sequential AIG simulation is 100x faster when
programmed with several principles in mind:

DFS ordering of the AIG nodes from primary inputs

Minimalist representation of AIGs using 8 bytes per AIG node
(only two fanins are stored)

Modifying the AIG representation to be more CPU-cache friendly,
by going from an absolute representation (each node stores its
fanin locations) to the relative one (each node stores the
relative position of its fanins in the traversal frontier)

These observations led to the rethinking of the foundations of AIG
manipulation. Our earlier AIG package requires 32 bytes of memory per
AIG node, not counting the optional fanout representation and
structural hashing table. In contrast, much faster and more scalable
AIG-based computations have been developed using a package with only 8
bytes per node, while adding temporary data-members (such as reference
counter, level counter, cuts, fanout, etc) on demand and freeing them
as soon as possible. These speedups are only achieved for AIGs with
millions of AIG nodes, while the performance on smaller AIGs (around
10K-50K nodes) is about 1-2x faster.

2. SAT Methods

SAT has grown to become the dominating technique for formal
verification, largely replacing BDD-based engines as the scalable
alternative that can conquer industrial sized designs. It has also
become the work horse of many logic synthesis algorithm, such as clock
gating and redundancy removal.

A main theme in the application of modern SAT-solvers is
incrementality: how to best craft the SAT-based algorithm to
reuse the information derived during one problem to improve the
performance of successive and related SAT-queries. It is also
important to keep an eye on the overhead generated, going from one
query to the next. E.g., equivalence checking typically generates
hundreds of thousands of very easy and highly related SAT queries,
mixed together with a few thousand really hard problems. The solver
must have minimal overhead for the easy problems, but still work
efficiently on the few hard ones.

Part of our research relates to improving this interaction between
the top-level algorithm and the underlying SAT solver. This may
require reformulating algorithms to take full advantage of the
incremental SAT interface of a modern solver, but may also require
extensions and improvements to the SAT-solver itself.

We are also looking into new ways of integrating an AIG
representation with a SAT-solver natively to avoid the overhead of
translating circuits into clauses. To truly reap the benefit of this
approach, the SAT-solver has to support some circuit specific
operations, such as transfering the fanouts of one node to another
equivalent node, or restricting the boolean propagation to the
transitive fanin cones of a set of nodes. By working out the details
of how to do this and other things efficiently, we hope to get a
significant speedup in this application domain of SAT.

3. BDD package

BDDs are one of only a few ways in which a final definitive answer
can be given to a model checking or equivalence checking
question. They can answer SAT, UNSAT, or run out of a memory or time
bound. Alternate methods are interpolation, induction, bounded model
checking (BMC), and synthesis, each of which have their areas of
strength. Often a problem can be trimmed down to a reasonable size,
say a few hundred flip-flops and inputs, in which case BDDs are a good
choice to try. New and more powerful abstraction methods combined with
synthesis and retiming have been developed in the past few
years. These are capable of reducing a problem significantly, while
spending less time on this. It is therefore well motivated to continue
to invest in improved BDD techniques even though much has been done in
the past. We aim to put together the best BDD package we can,
exploring the best methods from past research, and to explore further
improvements based on experience with industrial applications.

Synthesis

4. Technology-Independent Combinational Synthesis

Combinational logic synthesis restructures and optimizes a
technology-independent representation of a logic
network. Traditionally, logic functions of the nodes have been
represented using sums-of-products (SOPs) [33]. In ABC, the networks
are represented as And-Inverter-Graphs (AIGs). AIGs can be stored
compactly in files or in memory, taking 3.5 bytes per node on average
[4]. AIGs have no fixed node boundaries [7] allowing optimization
algorithms to scan though all boundaries choosing the ones that lead
to the largest improvement during synthesis.

In the last few years, we developed several efficient AIG
optimization algorithms. The main workhorse has been AIG rewriting
[5][22], which looks at all 4-input cuts of each AND-node, compares
its structure with a set of pre-computed AIG subgraphs, and chooses
the subgraph with the fewest AIG nodes, possibly subject to
constraints, such as on AIG levels. Other algorithms for AIG
minimization have been developed such as redundancy removal and
resubstitution [24] using SAT to compute and use local don.t
cares. Synthesis with structural choices [10] records and
stores multiple snapshots of the same design. This improves the
quality of technology mapping [23] and can lead to scalable
verification, which also extends into the sequential domain [28].

Clock-gating had become of great interest recently. It method
prevents unnecessary switching of registers and reduces dynamic power
consumption. Based on a simulation and satisfiability paradigm [20],
we have developed a method [13] that can be made scalable and can
save, on average, 14% of all register clock transitions. Further, the
extraction of a gating cover from pre-existing signals allows don.t
cares to be exploited to reduce logic size by 7%. Even more power can
be saved using sequential methods. However, this often leads to
extremely hard sequential equivalence checking problems. In some
cases, we have shown how these problems can be solved much more easily
by reducing the problem to a combinational equivalence checking
problem.

Often logic synthesis methods do not do well on XOR-rich circuits,
such as cryptographic applications and cyclic redundancy
checkers. This motivates the development of special methods for such
circuits, detecting these and implementing synthesis methods which
take advantage of the existence of XORs.

One method that has been very successful in the last few years has
been sequential signal correspondence where two or more signals
are identified as being equivalent on the set of states that can be
reached from the initial state. Such state equivalence classes occur
quite often and are not always obvious to the designer, since their
equivalence depends on the set of reachable states. One saving grace
is that often there are large subsets of these that form an inductive
invariant. When equivalence classes have been identified and proved,
they can be used to simplify the circuit by eliminating all but one
and transferring all the fanouts to the class representative. On the
other hand, many equivalences are true but can.t be proved
inductively. One way of enhancing this method is to find other
invariants and prove them all at once. Another way is to speculate on
equivalences and prove them by formal verification techniques.

Signal correspondence is one application where a specialized SAT
solver can be employed. This is because during the induction proving
process, many small problems are proved using SAT. We found the use of
a circuit-based solver in this application can speed up the
computation several times as well as allowing it to be applied to much
larger circuits not only because it is faster but also requires less
memory. This is a good example where an algorithm can be specialized
to an application yielding significant speedups.

5. Technology mapping

Technology mapping transforms an AIG into a network of logic
nodes. In standard-cell mapping, each node is represented by a gate
from a given library. In FPGA mapping, each logic node is represented
using a K-input look-up table (LUT) implementing any Boolean function
up to K inputs. The target and goals for a mapper can vary; standard
cells, various styles of FPGAs, minimization of delay, area, power,
and wires. Mappers can be cut-based, structurebased, or Boolean
machers.

A recent development was for standard cells based on structural
matching and disjoint-support decomposition. The approach uses of
structural choices, uniform handling of XORs and MUXes, improved phase
assignment, reduced runtime and memory of structural matching using
priority matches, similar to priority cuts [25]. It was motivated by
requests from industrial collaborators who wanted one that gave small
area, leaving the problem of meeting timing requirements mostly to
resizing and buffering algorithms. However, we also developed a new
timing-aware resynthesis method that can be in conjunction to this to
improve the delay if required [30]. The method uses mapping with
choices developed earlier as a regular part of any of our
mappers. Unlike previous methods, the algorithm is fast, is applicable
to very large networks, and incurs little area penalty.

6. Resynthesis

Resynthesis works on mapped networks and tries to improve the
results by re-decomposing groups of mapped cells or by rewiring or
resubstitution. Recent techniques, [26] and [27], are used for
re-synthesizing a mapped circuit to further minimize the number of
gates or LUTs. The first is based on resubstitution using don.t cares
and the second does gate or LUT packing. By iterating these two
Boolean techniques, it is not uncommon to reduce the LUT count by 10%
or more, compared to a mapping produced by a highly optimized
structural mapper.

7. Use of “boxes”

In many applications, the user does not want synthesis to touch
designated parts of a design. This feature is extensively used in FPGA
companies where boxes represent parts where there is a known highly
optimal implementation, or a box may designate a specialized type of
memory element. In the past, these have been dealt with by making them
.black. where their outputs become primary inputs to the rest of the
logic and their inputs become primary outputs of the logic. We have
just completed an extensive development which elevates all the
algorithms of ABC to use .white-boxes. and .black-boxes.. This allows
application to industrial designs with internal memories, DSP blocks,
and arithmetic circuits [15]. White boxes are logic components
(combinational or sequential blocks) which are not to be touched, but
contain information about their delays and logic structures, which can
be very useful for synthesizing the logic outside the boxes or even
for deciding how some of the boxes or hierarchy can be collapsed to
improve synthesis. As opposed to black boxes, white boxes expose all
their logic to synthesis, but synthesis is only constrained to not
change the interior of the boxes.

8. Sequential synthesis

Sequential synthesis is still rarely used in commercial
software. In contrast to combinational synthesis, sequential synthesis
can arbitrarily modify circuit behavior on the unreachable
states. Since it is common that reachable states constitute a small
fraction of all states, sequential synthesis can produce vastly
superior results over combinational synthesis. Sequential synthesis
may involve the following optimizations: retiming, re-encoding,
minimization with sequential don.t-cares, merging sequentially
equivalent nodes [11], etc. Some of these are quite scalable and still
produce impressive results. Sequential AIGs are used to extend
combinational AIGs by including memory elements [1].

One scalable sequential synthesis method uses induction to detect
and merge sequentially-equivalent nodes, that is, nodes that are
equivalent on reachable states, but possibly not equivalent otherwise
[28]. We have developed an inductive prover that is extremely fast,
using a number of new techniques, including speculative reduction
[31][17], circuit-based SAT, etc. This type of sequential synthesis is
not only scalable, but also is much more acceptable to industry
because state-encoding, scan chains, and test vectors are
preserved. Moreover, the synthesis results are guaranteed to be
sequentially verifiable using an independent inductive prover similar
to that used for synthesis. Experiments show impressive effectiveness;
when applied to a set of 20 industrial benchmarks ranging up to 26K
registers and up to 53K 6-LUTs, the average reductions in the number
of registers and LUT count were 12.9% and 13.1% respectively while
delay was reduced by 1.4%. These improvements were done on logic after
it was synthesized by a commercial FPGA tool.

9. Synthesis with placement metrics

Especially in FPGAs, connecting wires becomes the most critical
factor in determining chip area, critical path length, switching power
consumption, and so on. It is therefore important to consider
connection costs early within a design flow. However, design tools for
logic synthesis and technology mapping do not accurately estimate the
connect cost, which is available only after placement and routing. A
unit delay model that does not reflect wire delay is commonly used in
stages prior to placement and routing. Consequently, results are
suboptimal and biased. A better way is to initially do synthesis using
the best estimates one can get for the wire delays and then apply
incremental methods which alternate between placement and
synthesis. The idea is that placement feeds actual wire delays for a
synthesized, placed and routed design. Then synthesis uses these to
re-synthesize critical parts of a design, trying to keep the changes
small. Incremental layout is then invoked which works only on the
changed logic, leaving the rest of the placement and routing
undisturbed. This can be iterated for improved results. This technique
was recently implemented using retiming as the synthesis
technique.

Formal Verification

10. Sequential equivalence checking

Sequential equivalence checking (SEC) tests the correctness of
designs after sequential synthesis (SS). Efficient SEC is important
because it enables wider adoption of innovative SS methods. Recently
we developed an integrated engine for general-case SEC based on
structural analysis, simulation, combinational synthesis, retiming,
and other types of transforms [28]. In particular, our induction-based
sequential synthesis, which merges sequentially equivalent registers
and nodes, is used to reduce the size of the problem. This was tested
on industrial benchmarks and found capable of proving equivalence of
the original design to that obtained by using most of the synthesis
scenarios involving sequential transforms, particularly retiming and
signal correspondence

11. Model checking

Model checking (MC) attempts to prove properties for one copy of a
design. It subsumes SEC where the property is the equivalence to the
original circuit. SEC often benefits from structural similarities
among the two copies of the miter produced; this is not available for
MC. Since k-step induction is one of the most widely used but
incomplete proof methods in MC [32], several techniques were developed
(for example, [6]) to strengthen induction so that MC can be converged
for those problems where induction fails without any strengthening of
it. In the past, we explored several ways of strengthening induction
as a proof method for the target property in model checking by
developing efficient frameworks for considering implications as
candidate invariants [8] and by selecting invariants using
counter-examples [9]. The latter method systematically explores the
space of candidate inductive invariants, which allows us to find and
prove invariants that are few in number and immediately help the
problem at hand. This method is applied to interpolation where
invariants are used to refute an error trace and help discard spurious
counterexamples. This work of finding additional invariants and using
them continues since induction is one of the key methods available for
proving properties.

We have implemented and experimented with three sophisticated
abstraction/refinement algorithms in ABC. .abstract. starts by
replacing all registers by primary inputs and executing BMC
procedure. This abstraction is then refined using counter-examples,
reintroducing some of the registers. This is called counter-example
based abstraction. Another method is .proof-based. abstraction. We
have recently combined CBA and PBA into a single method for
abstraction which additionally uses techniques where a single instance
of a SAT problem is used during the entire abstraction process. This
speeds up the process greatly and the combination of CBA/PBA seems to
be able to produce smaller abstractions on average. The method is very
scalable and can solve hard industrial benchmarks with several million
AIG nodes, well beyond the scope of other verification methods.

Speculative reduction starts by deriving and refining as many
speculated equivalences as it can within a certain resource limit. At
that point it assumes the remaining equivalences and creates an
abstracted model to which other verification transformations can be
applied. If UNSAT is proved on the speculative reduced model, then the
original problem has been proved; else, a counterexample is produced,
which can be used to refine the remaining equivalence classes. This
method has been shown to be effective in some hard verification
problems by reducing the model to a manageable size, allowing other
methods, like BDD reachability to succeed.

Programming environment

R&D of new CAD algorithms is very limited without prototyping them in
a software system and rigorously testing the prototypes on industrial
designs. In addition, an efficient and well-structured software source
code is necessary to enable others to experiment with new ideas and
adapt to their own environments and problems. ABC is a growing CAD
tool for synthesis and verification of binary sequential logic
circuits appearing in synchronous hardware designs. It combines
scalable logic optimization based on And-Inverter Graphs (AIGs),
optimal-delay DAG-based technology mapping for look-up tables and
standard cells, and innovative algorithms for sequential synthesis and
verification. ABC provides a base of very efficient and scalable
algorithms and implementations of fundamental algorithms in synthesis
and verification, including SAT, AIG packages, BMC, retiming,
induction, etc. It provides experimental implementations of
verification and synthesis algorithms and a programming environment
for building new applications. The full source code and documentation
on the use of ABC is publicly available [3].

Recently we saw the need for a more sophisticated scripting language
to be tied into ABC. We chose PYTHON and now have a set of python
functions that can better control a sequence of verification
commands. The basis of this is to try something based on the current
size of the circuit, interrogate the results and use this to decide on
the next thing to try. We have incorporated all of our ideas for
verification into a single python function called prove. It uses
sequtntial synthesis via signal correspondence and retiming, phase
abstraction, trimming and reparameterization, all the abstraction
methods (CBA/PBA and speculative reduction), induction, reachability
via interpolation or BDDs, extraction of implicit constraints and
using them in signal correspondence. All of these techniques are being
continually refined and improved via better and/or specialized SAT and
AIG packages, improved memory use and fewer cache misses, etc. By
applying prove to real industrial examples that we have access to
through our industrial partners, we can experience where are the
bottle-necks and which methods need more work. This allows to set our
priorities for new research based on real applications.