Comments 0

Document transcript

PICo Digital Signal Processor Design Project

TEAM

ADD

1.

INTRODUCTION

In this paper, we will discuss our product of an

embedded digitalsignal processor design in the FreePDK 45nm technology. In hopeto win the contract from the Portable Instruments Company(PICo), we designed and implemented a signal processing ALUwith required functionalities

(Table1)

and the best performancewe could achieve.

The transistor level hierarchical netlist of theentire DSP, Cadence simulations demonstrating properfunctionalities of all functions are attached for review.

2.

DESIGN DESCRIPTION

The Digital Signal Processorconsists

of an ALU with 8 availablefunctions (Table1)

defined byour 3-bit control value, threeregisters placed after 16-bit inputs and before output, and buffersafter inputs and after registers.Shown below is the top leveldesign of our ALUwithits inputs/outputs going through 3registers.

Table2. ALU Functions

and Descriptions

ALU Functions

Description

Control

ADD

Out = A + B

000

SUB

Out = A–

B

001

NOP

No change at Out

010

SHIFT

Out = A<<B

011

AND

Out = A & B

100

OR

Out = A | B

101

PASS A

Out = A

110

MULTIPLIER

Out = A (first 8 bits) * B(first 8bits)

111

Critical design decisions are discussed in the subsections below.

2.1

Combining

ADD and SUB

After testing each path with a different function block, weconcluded that the ADD/SUB path has the longest propagationdelay and thus itconstructs

the critical path.Here is how weimplemented (combined) ADD and SUB

(SUB is ADD with

inverted inputs B and Carry-in = 1:

2.2

Modified Carry Look-Ahead Adder

Forthe

ADD/SUB function block, we utilized theModifiedCarry Look-ahead Adder

(MCLA)topology1.

The simplest binary adder is ripple carry adder2.

It is easy to beunderstood and implemented. A more complex binary adder iscarry lookahead adder (CLA)

3.“It uses the same carry lookaheadcircuits to construct the higher-bit CLA

Justifications of the utilization of this register topology will bediscussed in Part 4.

2.4

Vdd Value

We tested the design product using 0.95V, 1V and 1.1V as thevoltage supply (Vdd).0.95V

is the one that gives the smallestmetric, while having the design work properly.

2.5

Sizing

To order to minimize the total area, we set allgates not on thecritical path to minimum sizes (wn,wp=90n). For transistors onthe critical path, they are sized for equal pull-up and pull-downstrength. Transistors driving larger load are sized larger.Sizes areshown in the Netlist attached.

3.

INNOVATION

In order to attain the bestperformance

andminimum metricconsumption (Delay^2*Power*Area), we did the following:

1.

Change/optimize components topologies

a.

Combine ADD and SUB into one functionblock with inverted inputs. It saves area andpower consumption, but adds a little moredelay.

toclockskew. It uses a minimum number oftransistors thus area, it consumes significantlyless power, and it has the minimum delay.

However, the tradeoff is that it is less robustand requires buffers at the output to avoidbeingaffected

by changes elsewhere in thecircuit.

2.

Size the elements on the critical path

a.

Minimize sizing for elements not on thecritical path. This reduces total area.

b.

Upsize elements on thecritical

path to obtainthe best delay.Wesized the transistors toFigure

3.Subcircuit inside a MPFA Block

Figure

4.4-bit

MCLA

Figure

5.16-bit MCLA

Figure

6.C^2MOS Master-slave Positive Edge-triggered Register

have equal pull-up andpull-downstrength.Then, for elements driving big load (fanout),we upsized them to a point where the areadoes not increase too much, but the delay getsminimized. Because of the complexity of thisprocessor, we did not do hand-calculationsfor optimal sizing. But through runningmultiple simulations, we obtained the resultsthat give us the best metric. See attachedNetlist for sizing detail.

This reduces theworst-case delay, but increases area.

3.

Reduce the supply voltage Vdd to obtain the best metricwhile having all functions work properly.

Lower Vddgives less power but greater delay. Also if it is too low,the circuit does not work properly.

Note we did not put Full Adder here because it isobvious that the Mirror Adder has better performance interms of thespecified

metric.

3.

Register Topology

testedwith

idealclock

in a separate

test circuit (not as part of theprocessor)

Table4.Metric ofRegister

of DifferentTopology

REG

StaticCMOS

DynamicPassgate

C^2MOS

TSPC

# transistors

22

8

8

11

Power(w)

6.28E-5

N/A

2.45E-5

3.37E-5

Delay_wc(s)

2.9E-11

N/A

1.5E-11

2E-11

As shown in the table,C^2MOS

gives the betterperformance in all

aspects. Note we did nottestDynamic Passgate’s power and delay here because it issensitive to clockskew and had bad output.

We chose C^2MOS as the register topology, and thentested it more rigorously. With a non-ideal bufferedclock, it stilloutperformed

Static CMOS in every metric.

3.2

Arbitrary Function

Our arbitrary function is

an

8-bit multiplier.It takes in two 16-bitinputs, multiplies their first 8 bits and outputs a value up to 16 bits.

We chose regular full adders and Andgates (with minimumsizeswp=wn=90n) to implement the multiplier, since it is the mostconvenient to implement and there is no requirement on the delaymetric. However, this saves area.

Due to the output bit limitation (16 bits in total) of the ALU, themaximum numbers of bits of each input are set to 8. Themultiplier then takes the first 8 bits of both inputs (A7-A0, B7-B0)and outputs the multiplication results in 16 bits.

Delay, power, area results of the multiplier is shown in Table5 in4.2. Simulation results are attached.

As shown by the top section of delay breakdown, we proved thatwe chose the correct critical path, and successfully minimized thecritical path

delay. Also, we worked with the tradeoff betweenarea and delay for different topologies for ADD/SUB, and foundthe right one with the best performance. Moreover, our designuses the minimum vdd we could use to save power consumption.Overall, our designproduct meets all the requirements proposedby PICo. We wish to further work with the company under thecontract.