2.6.2 Adders

We can view addition
in terms of generate , G[ i
], and propagate , P[ i ],
signals.

method
1 method 2

G[
i ]=A[
i ]·B[
i ] G[ i ]=A[
i ]·B[
i ](2.42)

P[
i ]=A[
i ]B[
i ] P[ i ]=A[
i ]+B[
i ](2.43)

C[
i ]=G[
i ]+P[
i ]·C[
i1] C[
i ]=G[
i ]+P[
i ]·C[
i1](2.44)

S[
i ]=P[
i ]C[
i1] S[
i ]=
A[ i ]B[
i ]C[
i1](2.45)

where C[
i ] is the carry-out signal from stage
i , equal to the carry in of stage (
i+1).
Thus, C[ i ]=COUT[
i ]=CIN[
i+1].
We need to be careful because C[0] might represent either the carry in or
the carry out of the LSB stage. For an adder we set the carry in to the
first stage (stage zero), C[1] or CIN[0], to '0'. Some people use delete
(D) or kill (K) in various ways for the complements of G[i] and P[i],
but unfortunately others use C for COUT and D for CINso I avoid using any
of these. Do not confuse the two different methods (both of which are used)
in Eqs. 2.422.45 when forming the sum, since the propagate signal,
P[ i ],
is different for each method.

Figure 2.22(a) shows
a conventional RCA. The delay of an n
-bit RCA is proportional to n and
is limited by the propagation of the carry signal through all of the stages.
We can reduce delay by using pairs of "go-faster" bubbles to change
AND and OR gates to fast two-input NAND gates as shown in Figure 2.22(a).
Alternatively, we can write the equations for the carry signal in two different
ways:

either C[
i ]=A[
i ]·B[
i ]+P[
i ]·C[
i1](2.46)

orn C[
i ]=(A[
i ]+B[
i ])·(P[
i ]'+C[
i1]),(2.47)

where P[
i ]'=NOT(P[
i ]). Equations 2.46 and 2.47 allow us to build the carry chain from
two-input NAND gates, one per cell, using different logic in even and odd
stages (Figure 2.22b):

even
stages odd stages

C1[
i ]'=P[
i]·C3[
i1]·C4[
i1] C3[
i ]'=P[
i]·C1[
i1]·C2[
i1](2.48)

C2[
i ]=A[
i]+B[
i] C4[
i ]'=A[
i]·B[
i](2.49)

C[
i ]=C1[
i]·C2[
i] C[
i ]=C3[
i]'+C4[
i]'(2.50)

FIGURE 2.22 The
ripple-carry adder (RCA). (a) A conventional RCA. The delay may be
reduced slightly by adding pairs of bubbles as shown to use two-input NAND
gates. (b) An alternative RCA circuit topology using different cells
for odd and even stages and an extra connection between cells. The carry
chain is a fast string of NAND gates (shown in bold).

(the carry inputs to
stage zero are C3[1]=C4[1]='0').
We can use the RCA of Figure 2.22(b) in a datapath, with standard cells,
or on a gate array.

Instead of propagating the
carries through each stage of an RCA, Figure 2.23 shows a different
approach. A carry-save adder ( CSA ) cell CSA(A1[
i ], A2[ i ], A3[
i], CIN,
S1[ i ], S2[
i ], COUT) has three outputs:

S1[
i ]=CIN(2.51)

S2[
i ]=A1[
i ]A2[
i ]A3[
i]=PARITY(A1[
i ], A2[ i ], A3[
i])(2.52)

COUT=A1[
i ]·A2[
i ]+[(A1[
i ]+A2[
i ])·A3[
i]]=MAJ(A1[
i ], A2[ i ], A3[
i])(2.53)

The inputs, A1, A2, and A3;
and outputs, S1 and S2, are buses. The input, CIN, is the carry from stage
( i1).
The carry in, CIN, is connected directly to the output bus S1indicated by
the schematic symbol (Figure 2.23a). We connect CIN[0] to VSS. The
output, COUT, is the carry out to stage (
i +1).

A 4-bit CSA is shown in Figure 2.23(b).
The arithmetic overflow signal for ones' complement or two's complement
arithmetic, OV, is XOR(COUT[MSB], COUT[MSB1])
as shown in Figure 2.23(c). In a CSA the carries are "saved"
at each stage and shifted left onto the bus S1. There is thus no carry propagation
and the delay of a CSA is constant. At the output of a CSA we still need
to add the S1 bus (all the saved carries) and the S2 bus (all the sums)
to get an n -bit result using a
final stage that is not shown in Figure 2.23(c). We might regard the
n -bit sum as being encoded in the
two buses, S1 and S2, in the form of the parity and majority functions.

We can use a CSA to add multiple
inputsas an example, an adder with four 4-bit inputs is shown in Figure 2.23(d).
The last stage sums two input buses using a carry-propagate adder
( CPA ). We have used an RCA as the CPA in Figure 2.23(d) and
(e), but we can use any type of adder. Notice in Figure 2.23(e) how
the two CSA cells and the RCA cell abut together horizontally to form a
bit slice (or slice) and then the slices are stacked vertically to
form the datapath.

FIGURE 2.23 The
carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol
for a CSA. (d) A four-input CSA. (e) The datapath for a four-input,
4-bit adder using CSAs with a ripple-carry adder (RCA) as the final stage.
(f) A pipelined adder. (g) The datapath for the pipelined version
showing the pipeline registers as well as the clock control lines that use
m2.

We can register the CSA stages
by adding vectors of flip-flops as shown in Figure 2.23(f). This reduces
the adder delay to that of the slowest adder stage, usually the CPA. By
using registers between stages of combinational logic we use pipelining
to increase the speed and pay a price of increased area (for the registers)
and introduce latency . It takes a few clock cycles (the latency,
equal to n clock cycles for an
n -stage pipeline) to fill the pipeline, but once it is filled, the
answers emerge every clock cycle. Ferris wheels work much the same way.
When the fair opens it takes a while (latency) to fill the wheel, but once
it is full the people can get on and off every few seconds. (We can also
pipeline the RCA of Figure 2.20. We add i registers on the A and B inputs
before ADD[ i ] and add (
ni
) registers after the output S[ i
], with a single register before each C[
i ].)

The problem with an RCA is
that every stage has to wait to make its carry decision, C[
i ], until the previous stage has calculated C[
i1].
If we examine the propagate signals we can bypass this critical path. Thus,
for example, to bypass the carries for bits 47 (stages 58) of an adder we
can compute BYPASS=P[4].P[5].P[6].P[7]
and then use a MUX as follows:

C[7]=(G[7]+P[7]·C[6])·BYPASS'+C[3]·BYPASS.(2.54)

Adders based on this principle
are called carry-bypass adders ( CBA ) [Sato et al., 1992].
Large, custom adders employ Manchester-carry chains to compute the
carries and the bypass operation using TGs or just pass transistors [Weste
and Eshraghian, 1993, pp. 530531]. These types of carry chains may
be part of a predesigned ASIC adder cell, but are not used by ASIC designers.

Instead of checking the propagate
signals we can check the inputs. For example we can compute SKIP=(A[
i1]B[
i1])+(A[
i ]B[
i ]) and then use a 2:1 MUX to select
C[ i ]. Thus,

CSKIP[
i ]= (G[
i ]+P[
i ]·C[
i1])·SKIP'+C[
i2]·SKIP.(2.55)

This is a carry-skip
adder [Keutzer, Malik, and Saldanha, 1991; Lehman, 1961]. Carry-bypass
and carry-skip adders may include redundant logic (since the carry is computed
in two different wayswe just take the first signal to arrive). We must be
careful that the redundant logic is not optimized away during logic synthesis.

If we evaluate Eq. 2.44
recursively for i =1,
we get the following:

C[1]=G[1]+P[1]·C[0]=G[1]+P[1]·(G[0]+P[1]·C[1])

=G[1]+P[1]·G[0].(2.56)

This result means that we can
"look ahead" by two stages and calculate the carry into the third
stage (bit 2), which is C[1], using only the first-stage inputs (to calculate
G[0]) and the second-stage inputs. This is a carry-lookahead adder
( CLA ) [MacSorley, 1961]. If we continue expanding Eq. 2.44,
we find:

C[2]=G[2]+P[2]·G[1]+P[2]·P[1]·G[0],

C[3]=G[3]+P[2]·G[2]+P[2]·P[1]·G[1]+P[3]·P[2]·P[1]·G[0].(2.57)

As we look ahead further these
equations become more complex, take longer to calculate, and the logic becomes
less regular when implemented using cells with a limited number of inputs.
Datapath layout must fit in a bit slice, so the physical and logical structure
of each bit must be similar. In a standard cell or gate array we are not
so concerned about a regular physical structure, but a regular logical structure
simplifies design. The BrentKung adder reduces the delay and increases the
regularity of the carry-lookahead scheme [Brent and Kung, 1982]. Figure 2.24(a)
shows a regular 4-bit CLA, using the carry-lookahead generator cell (CLG)
shown in Figure 2.24(b).

FIGURE 2.24 The
BrentKung carry-lookahead adder (CLA). (a) Carry generation in a 4-bit
CLA. (b) A cell to generate the lookahead terms, C[0]C[3]. (c) Cells
L1, L2, and L3 are rearranged into a tree that has less delay. Cell L4 is
added to calculate C[2] that is lost in the translation. (d) and (e) Simplified
representations of parts a and c. (f) The lookahead logic for an 8-bit
adder. The inputs, 07, are the propagate and carry terms formed from the
inputs to the adder. (g) An 8-bit BrentKung CLA. The outputs of the
lookahead logic are the carry bits that (together with the inputs) form
the sum. One advantage of this adder is that delays from the inputs to the
outputs are more nearly equal than in other adders. This tends to reduce
the number of unwanted and unnecessary switching events and thus reduces
power dissipation.

In a carry-select adder we
duplicate two small adders (usually 4-bit or 8-bit addersoften CLAs) for
the cases CIN='0'
and CIN='1'
and then use a MUX to select the case that we needwasteful, but fast [Bedrij,
1962]. A carry-select adder is often used as the fast adder in a datapath
library because its layout is regular.

We can use the carry-select,
carry-bypass, and carry-skip architectures to split a 12-bit adder, for
example, into three blocks. The delay of the adder is then partly dependent
on the delays of the MUX between each block. Suppose the delay due to 1-bit
in an adder block (we shall call this a bit delay) is approximately equal
to the MUX delay. In this case may be faster to make the blocks 3, 4, and
5-bits long instead of being equal in size. Now the delays into the final
MUX are equal3 bit-delays plus 2 MUX delays for the carry signal from
bits 06 and 5 bit-delays for the carry from bits 711. Adjusting the block
size reduces the delay of large adders (more than 16 bits).

We can extend the idea behind
a carry-select adder as follows. Suppose we have an
n -bit adder that generates two sums: One sum assumes a carry-in
condition of '0', the other sum assumes a carry-in condition of '1'. We
can split this n -bit adder into
an i -bit adder for the
i LSBs and an ( n
i )-bit adder for the n
i MSBs. Both of the smaller adders generate two conditional sums
as well as true and complement carry signals. The two (true and complement)
carry signals from the LSB adder are used to select between the two (
n
i+1)-bit
conditional sums from the MSB adder using 2(
n
i+1)
two-input MUXes. This is a conditional-sum adder (also often abbreviated
to CSA) [Sklansky, 1960]. We can recursively apply this technique. For example,
we can split a 16-bit adder using i=8
and n=8;
then we can split one or both 8bit adders againand so on.

Figure 2.25 shows the
simplest form of an n -bit conditional-sum
adder that uses n single-bit conditional
adders, H (each with four outputs: two conditional sums, true carry, and
complement carry), together with a tree of 2:1 MUXes (Qi_j). The conditional-sum
adder is usually the fastest of all the adders we have discussed (it is
the fastest when logic cell delay increases with the number of inputsthis
is true for all ASICs except FPGAs).

FIGURE 2.25 The
conditional-sum adder. (a) A 1-bit conditional adder that calculates
the sum and carry out assuming the carry in is either '1' or '0'. (b) The
multiplexer that selects between sums and carries. (c) A 4-bit conditional-sum
adder with carry input, C[0].