Transcription

2 Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one clock cycle implies: CPI = 1 cycle time determined by length of the longest instruction path (load) but several instructions could run in a shorter clock cycle: waste of time consider if we have more complicated instructions like floating point! resources used more than once in the same cycle need to be duplicated waste of hardware and chip area IF ID IE MEM WB IM Reg DM Reg ALU

5 Fixing the problem with single-cycle designs I- One solution: a variable-period clock with different cycle times for each instruction class unfeasible, as implementing a variable-speed clock is technically difficult Another solution: use a smaller cycle time have different instructions take different numbers of cycles by breaking instructions into steps and fitting each step into one cycle II- Multicyle approach: Break up the instructions into steps each step takes one clock cycle. At the end of one cycle store data to be used in later cycles of the same instruction balance the amount of work to be done in each step/cycle so that they are about equal restrict each cycle to use at most once each major functional unit so that such units do not have to be replicated functional units can be shared between different cycles within one instruction

8 Pipelining Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Time Task order A B C D Time Task order A B C D Start work ASAP!! Do not waste time! 6 PM AM Not pipelined Assume 30 min. each task wash, dry, fold, store separate tasks use separate hardware So, can be overlapped 6 PM AM Pipelined Why is easy with MIPS? 1) all instructions are same length 1) fetch and decode stages are similar for all instructions 2) few instruction formats 1) simplifies instruction decode and makes it possible in one stage 3) memory operands appear only in load/stores so memory access can be deferred to exactly one later stage operands are aligned in memory one data transfer instruction requires one memory access stage What about x86? (1 t0 17 bytes instruction)

11 Hazards What makes it hard? Structural hazards: different instructions, at different stages, in the pipeline want to use the same hardware resource Control hazards: Deciding on control action depends on previous instruction Data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline we first briefly examine these potential hazards individually

33 Notes No write control for all pipeline registers and PC since they are updated at every clock cycle To specify the control for the pipeline, set the control values during each pipeline stage Control lines can be divided into 5 groups: IF NONE ID NONE ALU RegDst, ALUOp, ALUSrc MEM Branch, MemRead, MemWrite WB MemtoReg, RegWrite Group these nine control lines into 3 subsets: ALUControl, MEMControl, WBControl Control signals are generated at ID stage, how to pass them to other stages?

37 How ILP Works Issuing multiple instructions per cycle would require fetching multiple instructions from memory per cycle => called Superscalar degree or Issue width To find independent instructions, we must have a big pool of instructions to choose from, called instruction buffer (IB). As IB length increases, complexity of decoder (control) increases that increases the datapath cycle time Prefetching instructions sequentially by an IFU that operates independently from datapath control. Fetch instruction (PC)+L, where L is the IB size or as directed by the branch predictor.

38 Compiler/Hardware Speculation Compiler can reorder instructions Static Multiple Issue Compiler groups instructions into issue packets Group of instructions that can be issued on a single cycle Determined by pipeline resources required Think of an issue packet as a very long instruction Specifies multiple concurrent operations Very Long Instruction Word (VLIW) Compiler must remove some/all hazards Reorder instructions into issue packets with No dependencies with a packet Varies between ISAs; compiler must know! Pad with nop if necessary Hardware can look ahead for instructions to execute Buffer results until it determines they are actually needed Flush buffers on incorrect speculation Explicitly Parallel Instruction Computer (EPIC).

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor? Chapter 4 The Processor 2 Introduction We will learn How the ISA determines many aspects

Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

Lecture 8: Control COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Datapath and Control Datapath The collection of state elements, computation elements,

ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

Systems Architecture I Topics A Simple Implementation of MIPS * A Multicycle Implementation of MIPS ** *This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from

What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly