Transcription

1 Journal of Computer Science 6 (10): , 2010 ISSN Science Publications Low Power Multiplier Design Using Latches and Flip-Flops 1 C.N. Marimuthu and 2 P. Thangaraj 1 Department of Electronics and Communication Engineering, Maharaja Engineering College, Avinashi, 2 Department of Computer Technology, Kongu Engineering College, Perundurai, Anna University, Tamil Nadu, India Abstract: Problem statement: Power dissipation is designated as critical parameter in modern VLSI design field. In VLSI implementation low power concept is necessary to meet Moore s law and to produce consumer electronics with more back up and less weight. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power which is the major part of power dissipation. Multiplication occurs frequently finite impulse response filters, fast Fourier transforms, discrete cosine transform and other important DSP and multimedia kernels. Being one among the functional components of many digital systems the reduction of power dissipation in multipliers should be as much as possible. Approach: In this study a low power structure called Bypass Zero Feed A Directly (BZFAD) for shift and add multipliers was proposed for reducing the switching activity. Results: The simulation results showed conventional and proposed BZFAD 8 bit multipliers. Conclusion: From these results, BZFAD can attain considerable power reduction and area saving when compared to the conventional shift and add multipliers. Key words: Hot block ring counter, low power multiplier, low power ring counter, reduction in switching activity INTRODUCTION The Golden formula for calculating dynamic power dissipation is P = C L V 2 f. Power reduction can be achieved by various manners. They are reduction of output load capacitance C L, reduction of power supply voltage V and reduction of clock frequency f. Many research efforts have been devoted to reduce the power dissipation in different multipliers (Chandrakasan et al., 1992; Shen and Chen, 2002; Chen et al., 2005). Among multipliers, tree multipliers are used in high speed applications such as filters, but these require large area (Chen and Chu, 2007; Huang and Ercegovac, 2005). The Carry-Select-Adder (CSA)- based radix multipliers, which have lower area overhead, employ a greater number of active transistors for the multiplication operation and hence consume more power. Among other multipliers, shift-and-add multipliers have been used in many applications for their simplicity and relatively small area requirement (Nelson and Nagle, 1995; Wang et al., 2004). Steps leading to a low power multiplier: Architecture of a conventional shift and add multiplier with multiplicand A and multiplier B is shown in Fig. 1. Fig. 1: Architecture of conventional shift and add multiplier with major source of switching activity In the conventional shift and add multiplier, multiplier B is needed to be shifted in every cycle. The LSB obtained by shifting of B is connected to the select pin of multiplexer mux_a. It decides whether the multiplicand A or 0 is to be added with the partial Corresponding Author: C.N. Marimuthu, Department of Electronics and Communication Engineering, Maharaja Engineering College, Avinashi, Anna University, Tamil Nadu, India 1117

2 product obtained in the previous cycle for forming the new partial product. From this operation of conventional shift and add multiplier we can see that there are many sources of switching activities present in the conventional architecture. There are 6 major sources of switching activity in the multiplier, which are marked in the dashed ovals in the Fig. 1. They are, (a) shifting of the B register, (b) activity in the counter, (c) activity in the adder, (d) switching between 0 and A in the multiplexer, (e) activity in the multiplexer select, (f) shift of the partial product register. Power consumption can be lowered by minimizing or removing any of the above switching activity. More power reduction can be achieved by reducing the switching activity of nodes with higher capacitance. As an example B (0) is the selector line of the multiplexer which is connected to K gates for a K bit multiplier. If we somehow eliminate this node, noticeable power saving can be achieved. The BZFAD architecture: By eliminating or reducing the sources switching activity described in the registers and counters low power architecture of multiplier can be derived. The proposed BZFAD architecture with multiplicand A and multiplier B is shown in Fig. 2. In the BZFAD architecture (Mottaghi et al., 2009), for getting the reduction of power consumption the main areas concentrated are described as in Fig. 1. Shifting of b register (multiplier) using hot block ring counter: In the conventional architecture register B should be shifted to the right in every cycle. It s right bit appear as B (0). B (0) is used to select between A (multiplicand) and 0. If B (0) is one then A should be added to the previous partial product and, if B (0) is zero, then 0 should be added to the previous partial product. The rights shifting of B in each cycle give rise to some switching activities. To avoid this we use a low power ring counter to select the required bit of B without shifting in each cycle. The BZFAD architecture uses a multiplexer with a one hot encoded bus selector choosing the hot bit of B in each cycle. Operation in the adder with feeder and bypass registers: In the conventional architecture if LSB of B equal to zero then the current partial product is added to zero and if LSB of B equal to one then the current partial product is added to A. Addition of zero leads to unnecessary transitions in the adder that is, the adder can be bypassed if 0 is to be added and the partial product is required to be shifted to right by one bit. In BZFAD architecture the modifications are made by using Feeder and Bypass registers. The operations of the adder are optimized using these two registers. Feeder and Bypass registers are used to bypass the adder in cycles in which the LSB of B is zero. In each cycle the hot bit of next cycle is checked and the following operations are performed: Feeder is clocked if LSB of B in next cycle equal to 1 Bypass is clocked if LSB of B in next cycle equal to 0 Thus the current partial product is stored either in feeder or in bypass register. NAND and NOR gates are used to clock feeder and bypass registers. Since these are inverting logic inverted clock is fed to them. It is shown as ~Clock in Fig. 2. Thus the reduction of switching activity in adder is mainly due to the following reasons. Fig. 2: Architecture of low power multiplier (BZFAD) 1118

3 The right input of adder is A, which is constant during multiplication. This enables us to removing the multiplexer and feeding A directly to the adder, resulting in a noticeable power saving. In each cycle when the next hot bit is zero, feeder is not clocked and current partial product is stored in bypass register. So there is no transition in the adder input which also causes power saving. J. Computer Sci., 6 (10): , 2010 Shifting of the partial product register using P Low latch: The computation process of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which require a lot of switching activities. Thus switching activities within the functional units of a multiplier accounts for the majority of power dissipation of a multiplier as described in the following equation: Fig. 3: Conventional synchronous ring counter P Switching = αcv 2 ddf clk (1) Where: α = The switching activity parameter C = The loading capacitor V dd = The operating voltage f clk = The operating frequency Minimizing switching activities can efficiently lower power dissipation without affecting the circuit operation performance (Chen et al., 2003). The least significant bits of partial products need not to be shifted for completing the multiplication. We take advantage of this observation that is the multiplication can be completed by processing the most significant bits of partial products. The BZFAD architecture uses P Low latch to store the lower half of partial product. It has K latches for K bit multiplier. In the first cycle, the LSB of partial product become finalized and is stored in the right most latch of P Low. In the subsequent cycles the next LSBs are finalized and stored in the proper latches. The ring counter output is used to open the proper latch. Using this method no shifting of lower half of partial product is required; shifting is required only for the higher half of the partial product. Hot block ring counter: The ring counter used in the BZFAD multiplier is noticeably wider than the binary counter used in the conventional architecture. Therefore an ordinary ring counter, if used in BZFAD would raise more transitions than its binary counterpart in the conventional architecture. To minimize the switching activity of the counter we use low power ring counter. Fig. 4: Clock gator logic with flip flop s input and output Unnecessary transitions in the conventional ring counter: An n-bit synchronous ring counter is built by cascading n D flip-flops in a chain as shown in Fig. 3. In the above architecture all flip-flops have a common clock signal and each clock pulse is applied to all flip flops whereas inspecting the movement of the 1 in the counter chain reveals that each clock pulse must be applied to only 2 flip flops (not all of them).therefore on each clock pulse (n-2) s unnecessary transitions are raised in which s is the total number of transitions raised in a single flip flop and n is the number of flip flops. Steps towards hot block ring counter: According to the previous discussion some flip-flops can be clock gated leading to fewer switching activities. A flip-flop in a ring counter must be clocked if and only if either its input or its output is 1 immediately before the triggering clock edge comes. Therefore only 2 flip flops must be clocked in each cycle. The clock gating logic in the Fig. 4 ORs the value of flip flop s input and output on positive clock edges stores the result in a latch. The output of the latch determines whether or not to gate the clock signal. This clock gator is positive edge triggered. If we want to avoid all the unnecessary transitions raised by the clock signal we should provide each flipflop with the clock gating circuitry of Fig. 4, but this solution ends up with a large area overhead plus due to transitions in clock gator themselves the resulting ring counter will not have fewer switching activity. A better solution is used in the BZFAD architecture. 1119

4 One of the important properties of the ring counter is that its output is one hot encoded in Fig. 5. This property of the ring counter makes its output wide especially as the counter size increases. As an example, consider a 5 bit binary counter which counts from A ring counter with the same counting range is 32 bit wide. To reduce the switching activity of the counter the counter is partitioned in to a number of blocks which are clock gated with a special clock gating structure whose power and area overheads are independent of the block size. The clock gating structure is shown in Fig. 6. In the partitioned ring counter there is exactly one block should be clocked, it is the block which is having a 1 in it at that time (except for the case that one leaves a block and enters another). The block which is clocked is called hot block. Therefore for each block the Clock Gating structure (CG) should only know whether 1 has entered the block (from the right) and has not yet left it (from the left). The CG starts passing the clock pulses to the block once the 1 appears at the input of the first flip flop of the block. It shuts off the clock pulses after the 1 leaves the left most flip flop of the block. In Fig. 6 the signals entrance and exit is coming from the neighboring right and left blocks. The entrance and exit signals have special meanings as follows. When 1 entrance means that 1 is about to enter the block in the next cycle. This line is connected to the input of the left most flip flop in the right hand block. The exit signal indicates that 1 has left the block and hence it should no longer be clocked. In Fig. 2 the s M1 and M2 are controlled by the same hot block ring counter. So the low power ring counter has the main role in BZFAD architecture. MATERIALS AND METHODS In this study, we propose a low-power, low-area multiplier using BZFAD architecture. The BZFAD architecture avoids the unwanted addition and thus minimizes the switching power dissipation. The conventional (Chen et al., 2005; 2006) and proposed design of 8 bit multiplier functionality can be verified using Model-Sim software and synthesized for getting power and delay report with Xilinx software back end tool. RESULTS The simulation results of conventional and proposed BZFAD 8 bit multipliers are shown in Fig. 7 and 8. The Fig. 9 and 10 shows the power reports of both multipliers. Area report for conventional shift and add multiplier is given below: Fig. 5: Hot block architecture for ring counter with block of size 4 Design Information Command Line: Map -p xc2s100-fg cm area -k 4 -c 100 -tx off eight_conventional.ngd Target Device: x2s100 Target Package: fg456 Target Speed: -5 Mapper Version: spartan2 -- $Revision: 1.58 $ Mapped Date: Mon Dec 21 17:14: Design Summary Number of errors: 0 Number of warnings: 0 Number of Slices: 662 out of 1,200 55% Fig. 6: Clock gating structure for the Hot Block ring counter 1120 Area report for BZFAD multiplier is given below: Design Information Command Line: map -p xc2s100-fg cm area -k 4 -c 100 -tx off eight_bzfad.ngd

Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter

CHAPTER 11: Flip Flops In this chapter, you will be building the part of the circuit that controls the command sequencing. The required circuit must operate the counter and the memory chip. When the teach

DIGITAL COUNTERS http://www.tutorialspoint.com/computer_logical_organization/digital_counters.htm Copyright tutorialspoint.com Counter is a sequential circuit. A digital circuit which is used for a counting

Counters By: Electrical Engineering Department 1 Counters Upon completion of the chapter, students should be able to:.1 Understand the basic concepts of asynchronous counter and synchronous counters, and

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING A thesis work submitted to the faculty of San Francisco State University In partial fulfillment of the requirements for

Lecture 8: Synchronous Digital Systems The distinguishing feature of a synchronous digital system is that the circuit only changes in response to a system clock. For example, consider the edge triggered

Module-3 SEQUENTIAL LOGIC CIRCUITS Till now we studied the logic circuits whose outputs at any instant of time depend only on the input signals present at that time are known as combinational circuits.

LB no.. SYNCHONOUS COUNTES. Introduction Counters are sequential logic circuits that counts the pulses applied at their clock input. They usually have 4 bits, delivering at the outputs the corresponding

8.1 Objectives To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC. 8.2 Introduction Circuits for counting events are frequently used in computers and other digital

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Joseph LaBauve Department of Electrical and Computer Engineering University of Central Florida

Objectives Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop. describe how such a flip-flop can be SET and RESET. describe the disadvantage

Chapter 11 Analog-Digital Conversion One of the common functions that are performed on signals is to convert the voltage into a digital representation. The converse function, digital-analog is also common.

CHAPTER 4 THE ADDER The adder is one of the most critical components of a processor, as it is used in the Arithmetic Logic Unit (ALU), in the floating-point unit and for address generation in case of cache

5. Sequential CMOS Logic Circuits In sequential logic circuits the output signals is determined by the current inputs as well as the previously applied input variables. Fig. 5.1a shows a sequential circuit

Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

CHAPTER TEN Memory Cells The previous chapters presented the concepts and tools behind processing binary data. This is only half of the battle though. For example, a logic circuit uses inputs to calculate

3 Flip-Flops Flip-flops and latches are digital memory circuits that can remain in the state in which they were set even after the input signals have been removed. This means that the circuits have a memory

A NEW EFFICIENT FPGA DESIGN OF RESIDUE-TO-BINARY CONVERTER Edem Kwedzo Bankas and Kazeem Alagbe Gbolagade Department of Computer Science, Faculty of Mathematical Science, University for Development Studies,

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION Introduction The outputs from sensors and communications receivers are analogue signals that have continuously varying amplitudes. In many systems

Distributed by: www.jameco.com 1-800-831-4242 The content and copyrights of the attached material are the property of its owner. DM74LS161A DM74LS163A Synchronous 4-Bit Binary Counters General Description

ounter ounters ounters are a specific type of sequential circuit. Like registers, the state, or the flip-flop values themselves, serves as the output. The output value increases by one on each clock cycle.