Project ASY: Design of Synthesizable Asynchronous Circuits

This project is part of the examination for the
VLSI System Design course at the University of Twente.
The goals of this project are:

To apply your knowledge on asynchronous logic to design small circuits
by your own.

To set up your design in such a way that it can be synthesized by
a conventional synthesis tool for digital design, viz.
the Synopsys tools Design Compiler.

It is assumed that the student has completed project POW on low-power design and power simulations
before starting this project.

The description below refers to various file names. Once you have logged in, execute the command:

get-module asy asy

to get them in a subdirectory asy, or copy them manually from
the directory
/home/practice/soc/exercise/modules/asy to directory
asy (do not forget the "invisible" file
.synopsys_dc.setup).

Asynchronous Design Framework

The exercises are based on the four-phase bundled-data design
style for asynchronous design.

The design methodology used in this exercise differs from what you are
used to.
In order to be able to use a synchronous synthesis tool as the
Synopsys Design Compiler, the design is partitioned into modules that
are synthesized separately.
The modules are then assembled to build the entire design. In this
way, one can break feedback loops and one can avoid that logic cells
in e.g. matched-delay lines are eliminated as a result of
optimization.
Note also that a tool like LEDA cannot be used for asynchronous
designs (the existing rule set at least imposes a strictly synchronous
design style).

Exercise ASY-1: Getting Started

The exercise uses a variant of the familiar SISO (serial-in
serial-out) hardware block. The entity with name siso_asy
is declared in file
siso_asy_ent.vhd. It has an input channel and an output
channel. The input channel constitutes the passive side of a push
channel; so, the environment (testbench)
generates the data and the request signal
and the siso_asy block should acknowledge the reception of
the data. The output channel is the active side of push channel. So,
siso_asy generates the data and request and waits for an
acknowledge from the environment. Apart from these two channels, there
is a reset signal to force the hardware in a known state before actual
operation.

The test-vector controller (TVC)
can be found in file tvc_siso_asy.vhd.
It has an internal clock which is not connected to the hardware. It is
used to enforce a specific data rate such that comparisons of power
consumption with
synchronous equivalents of the design are possible. As it is
complementary to the design under verification (DUV), the input
channel is a push channel that reads data from file. In principle,
first a reset pulse is generated and then data items from the input
file are pushed into the DUV until the end of file is reached and the
simulation is stopped. Request signals are synchronized with the
internal clock. The output channel is passive and waits for request
signals from the DUV. At each request the data is written to file and
an acknowledge synchronized with the internal clock is generated. The
TVC contains some time-out checks in order to stop the simulation when
the DUV takes too long to react.

TVC and DUV are connected together in the testbench, as given in
file tb_siso_asy.vhd. As in its synchronous counterpart, this
setup has two hierarchical layers to make it possible
that the generic word length can be set in a VHDL configuration.

The simplest DUV just connects its input channel to its output
channel. The architecture of this feedthrough implementation of
siso_asy can be found in file siso_asy_ft_arch.vhd.
Compile entity and architecture for this design along with the TVC,
testbench and the configuration given in file conf_ft.vhd.
Simulate this design in order to become familiar with the setup. What
is the time interval between data items at input and output?

The next DUV consists of a pipeline in the style of Figure 2.9 of
the book by Sparsoe and Furber. A pipeline stage is given in the file
pl_stage.vhd. It consists of two processes, one describing
the Muller-C element taking care of the handshaking control and the
other describing the latch with generic word length
in the data path. By the way, when
synthesized from this description,
the Muller-C element will also be implemented by a latch. Note that
latches are synthesized when if statements are used in which
a signal is not assigned a value in all branches. Such constructs are
typically qualified as wrong in synchronous designs. As opposed to
earlier RTL-style code, signal assignments use a delay. These delays
will be ignored by the synthesis tool but help in modelling
asynchronous circuitry. Otherwise, the modelling of matched delays
would become difficult.

A four-stage pipeline is built from pipeline-stage elements in
siso_asy_pl4_arch.vhd. The description uses a VHDL construct
that you may not yet know: the generate statement. It is used
to instantiate subentities iteratively. In this way regular structures
can be built. A generate statement can be preceded by a condition
(this is not the case in this example) which is e.g. useful to
instantiate different subentities along the borders of regular
structures.

Compile the files pl_stage.vhd and
siso_asy_pl4_arch.vhd as well as the configuration
conf_pl4.vhd and simulate the 4-stage pipeline. Does the
pipeline behave as a first-in first-out FIFO buffer? Explain
what you see.

Exercise ASY-2: Synthesis Step by Step

The topic of this exercise is the architecture add2 of
siso_asy. It computes a "two-input pipelined addition" by
which it is meant that:

data_out(n) = data_in(n-1) + data_in(n)

It is implemented according to the schematics shown below where
data signals are blue and handshaking control signals are red:

.

The circuit consists of two main components:

An edge-triggered register. Reading the register is done without
handshaking. Writing the register has a handshake interface,
implemented in a very straightforward way, viz. by using the request
signal as the clock as well as the acknowledge. This simple approach
works assuming that the time needed by the input channel to change the
data after receiving an acknowledge is longer than the hold time of
the register.

A handshake adder. Its data path is trivial and just contains a
combinational adder.
Its control circuitry consists of a matched delay that should delay
the request for at least as much as the longest path in the adder.
Because it is more convenient, the actual description instantiates the
matched delay and the adder as separate components.

The circuit can be understood as follows:

The testbench prepares new data at the input channel and raises
the request signal req_in.

Because, the register's request does not change, the adder can
perform the intended addition.

After sufficient time has passed for the addition to finish, the
output channel's request signal req_out rises.

The testbench saves the result of the addition and then sends an
acknowledge.

This acknowledge is used as the clock for the register and signals
the register to store the current value of data_in.

The same acknowledge also signals the input channel that
the data has been consumed and new data can be fetched. The cycle can
start anew.

The add2 architecture is given in the file
siso_asy_add2_arch.vhd. This is a structural description that
instantiates the edge-triggered register described in the file
et_register.vhd, the adder described in the file
adder.vhd and the matched delay given in the file
matched_delay.vhd. The latter is a structural description
itself: it instantiates a number of uniti delays described in
unit_delay.vhd.

Compile the files just mentioned together with the configuration
conf_add2.vhd and simulate. Verify that the circuit shows the
intended behavior.

As mentioned earlier, the synthesis is done in small steps instead
of reading all source files at once. The script syn-single is
meant to synthesize a leaf component (a component at the lowest level
of the hiearchy). You should not edit this script in principle, but
only modify the file single-config to specify the synthesis
parameters.

The provided version of single-config contains already the
right set of parameters for the three leaf components
et_register, adder and unit_delay.
Synthesize them by commenting and uncommenting the appropriate lines
and running syn-single. If everything goes well, each run
produces a log file log_syn_single<run_id> and
hierarchical and flattened versions of the design in VHDL and
.db
format. An SDF file is also written.
As opposed to earlier exercises, the .db are
essential and need to exist when assembling the design. Always check
your log files to make sure that the synthesis went well.

The next step is to assemble the higher levels of the hiearchy.
This is handled by the script syn-assemble and the associated
configuration file assemble-config. The script reads the
source files needed to build the entire hierarchy ending at the
declared top-level entity, but does not synthesize. Instead, sublevels
are replaced by instances of components that have already been
synthesized. The variable MAP_LIST in the configuration
script specifies the mapping.
Once all components have been replaced by netlists (the
script always uses the "flat" version), a log file, new hierarchical
and flattened netlists and an SDF file are produced.

Use syn-assemble to assemble the matched delay and the
gate-level description of the two-input pipelined adder.
Important: when mapping one instance on another, the interface
(inputs and outputs) should entirely match as well as the values
assigned to the generics.

Study the log files obtained thus far. How long is the delay in the
adder? How long is the delay in the matched delay? In order to avoid
race conditions, one should dimension the matched delay with a 50 to
100 percent margin. Does this hold in this design?

Compile the gate-level netlist of add2 as well as the
configuration conf_add2_gate.vhd and simulate. Do not forget
the SDF file. Does the hardware perform correctly?

The scripts sim-gate and syn-gate that you know
of the POW project are also available here. They make it possible to
estimate the power consumed. Configuration is done from the file
assemble-config.

Estimate the power consumed by add2 using stimuli from the
file add2_random.in.

Now make a synchronous version of add2 in the directory
that you have used for project POW. Make sure that you operate at the
same data rate as the asynchronous circuit. First verify the
functionality of your circuit interactively and then perform a power
analysis. How does the power consumption compare to the asynchronous
circuit? Comment on your results.

Exercise ASY-3: Main Project

You are free to make one or more asynchronous designs by your own
until you have used the hours reserved for this project (some 40
hours). Use
a modular approach as in the example above. Make sure that the design
is first verified by means of pre-synthesis simulations. Then try to
synthesize the design. If you encounter problems, you may consider
hybrid simulations in which part of the design uses pre-synthesis code
and other parts use post-synthesis models (you have to write
appropriate configurations). If you have time, build as well a
synchronous equivalent circuit; perform power simulations on both and
compare the results.

Note: The VHDL code provided for a Muller-C element and the
pipeline stage are likely not to work after synthesis. If you run into
too much trouble to make your post-synthesis simulation work, you may
skip the synthesis and concentrate on pre-synthesis designs.

Suggestions for designs:

Improve the add2 circuit.

Blocked 2-input addition, in which two inputs are consumed, one
after the other and a single output per pair is produced.

Blocked 2-input or 3-input sort: read two/three inputs and sort
them from small to large and output them in sorted order.

A GCD circuit.

Deliverables

For ASY-1: relevant waveforms of simulations and answers to the
questions in the text.

For ASY-2: relevant waveforms of simulations and answers to the
questions in the text.

For ASY-3: diagrams and explanation of your designs, motivation
for design choices, results (waveforms, area and power figures).