Short Paper

Summary

This CTOV compiler converts almost arbitrary C/C++ code to a
hardware netlist. It operates by building the datapath for a
VLIW-like processor and a sequencer to control the datapath that may
or may not be microprogrammed. The datapath may consist of any number
of RAMs and ALUs and also holding registers for data that cannot be
transferred to/from RAM at the current time owing to RAM port
bandwidth constraints. There are many possible datapaths that will
serve to execute a given program, trading time for space. The compiler
will generate a datapath using an internal heuristic that places each
array present in the source code in a different RAM and converts all
other state-bearing variables to flip-flops. However, the user can
override the default action to control the datapath shape: he may
specify the location of every variable stating whether it should be
placed in a RAM or not, and if so, which RAM. The number of RAMs used
and the style and number of ports on each RAM can also be fully
specified by the user. This allows complete flexibility within a
design space that encompasses a pair of opposite points: one point where all
state is held in flip-flops and another point that places all
programming-model state in a single, single-ported RAM.

The compiler partitions the behaviour of the input program into
macrocycles. A macrocycle corresponds to a number of successive steps
in the input program by one thread. The number of steps implemented
by a macrocycle is controlled by a combination of three things: a size
heuristic, decidability of name aliases and user 'barrier' statements
inserted in the source code. Each macrocycle may take a number of
clock cycles to execute on the hardware, depending on the availability
of ALU's and RAM ports to compute, source and sink the data. Either
run-time or compile-time arbitration can be used to resolve such
structural hazards. Again, a default heuristic is provided: the
compiler performs compile-time arbitration for competition between
events caused by a common thread and run-time arbitration for
inter-thread competition. Un-decidability causes termination of the
macrocycle generation when array subscript comparison and/or loop exit
conditions cannot be calculated at compile time.

Features

CTOV is a package that enables hardware to be generated from
C. It is based on the TT CY compiler core. For performance analysis
and fast system simulation, the C can also be run as software by
linking with a threads library. We provide this and a set of standard
Tenos C libraries for compiling to hardware.

Compiling C and C++ to hardware is a broad
subject that ranges from simple processing that is almost like
macro-expansion to complex processing involving a great deal of
compile-time manipulation and automatic mapping.
CTOV sits at the complex processing end of this spectrum, and, in our view (2000),
the industry will take some years before this sort of tool
is in widespread use.

CTOV can handle a very large subset of C, but not arbitrary
recursive functions. With the current version of CTOV, it
is unrealistic to take and use a section of C not written
specifically for hardware compilation. However, variables can be
updated by more than one thread and fork, join and mutexes are
provided, which is a major step forward compared with Verilog.

Miniature Tutorial Example From The Manual

AND gate

The first example is a section of C which makes a single AND gate.
It is important for practical hardware design that an engineer can
instruct the tools to produce gate-level features where required.

void and2gate(u1 *y, u1 a, u1 b)
{
*y = a && b;
}

In the example, the AND output is via a call by reference parameter
instead of a returned value. An alternative would be as follows

u1 and2gate(u1 a, u1 b)
{
return a & b;
}

The switch to logical AND (&&) from binary AND (&) makes no difference
in this example. The output from compiling the first version of this
simple gate is as follows. The second version does not make sense to
compile as a top level routine since our compiler discards the value
returned from the top level routine.

Documentation

The CY compiler is an advanced behavioural elaborator that provides
the internal processing for CTOV. The front end elaborates the
block-structured imperative source language into a set of control and
data units which can be implemented largely in parallel.

The backend performs logic minimisation, folding and allocation of
processing units, such as adders and multipliers to implement the
required data flow graph. Unlike typical DSP compilers, CTOV does not
artificially distinguish between control and data and thereby allows
arbitrary data-dependence in the algorithms.

Example Runs

These example runs mostly use the old netlist format. The MT mode format
allocates variables representing nets on the heap instead of on the stack.
An extension to CTOV also supports most SystemC constructs, but this has
not been tested in depth.