(PPT)*

2.
What You Will Learn In This Set of Lectures <ul><li>Memory Hierarchy </li></ul><ul><li>Memory Technologies </li></ul><ul><li>Cache Memory Design </li></ul><ul><li>Virtual Memory </li></ul>

3.
Memory Hierarchy <ul><li>Memory is always a performance bottleneck in any computer </li></ul><ul><li>Observations: </li></ul><ul><ul><li>Technology for large memories (DRAM) are slow but inexpensive </li></ul></ul><ul><ul><li>Technology for small memories (SRAM) are fast but higher cost </li></ul></ul><ul><li>Goal: </li></ul><ul><ul><li>Present the user with a large memory at the lowest cost while providing access at a speed comparable to the fastest technology </li></ul></ul><ul><li>Technique: </li></ul><ul><ul><li>User a hierarchy of memory technologies </li></ul></ul>Fast Memory (small) Large Memory (slow) Memory Hierarchy

14.
DRAM Technology <ul><li>Conventional DRAM: </li></ul><ul><ul><li>Two dimensional organization, need Column Address Stroke (CAS) and Row Address Stroke (RAS) for each access </li></ul></ul><ul><li>Fast Page Mode DRAM: </li></ul><ul><ul><li>Provide a DRAM row address first and then access any series of column addresses within the specified row </li></ul></ul><ul><li>Extended- Data-Out (EDO) DRAM: </li></ul><ul><ul><li>The specified Row/Line of Data is Saved to a Register </li></ul></ul><ul><ul><li>Easy Access to Localized Blocks of Data (within a row) </li></ul></ul><ul><li>Synchronous DRAM: </li></ul><ul><ul><li>Clocked </li></ul></ul><ul><ul><li>Random Access at Rates on the Order of 100 Mhz </li></ul></ul><ul><li>Cached DRAM: </li></ul><ul><ul><li>DRAM Chips with Built- In Small SRAM Cache </li></ul></ul><ul><li>RAMBUS DRAM </li></ul><ul><ul><li>Bandwidth on the Order of 600 MBytes per Second When Transferring Large Blocks of Data </li></ul></ul>

16.
Why Memory Hierarchy Works? <ul><li>The Principle of Locality: </li></ul><ul><ul><li>Program Accesses a Relatively Small Portion of the Address Space at Any Instant of Time. Example: 90% of Time in 10% of the Code </li></ul></ul><ul><ul><li>Put All Data in Large Slow Memory and Put the Portion of Address Space Being Accessed into the Small Fast Memory. </li></ul></ul><ul><li>Two Different Types of Locality: </li></ul><ul><ul><li>Temporal Locality (Locality in Time): If an Item is Referenced, It will Tend to be Referenced Again Soon </li></ul></ul><ul><ul><li>Spatial Locality (Locality in Space): If an Item is Referenced, Items Whose Addresses are Close by Tend to be Referenced Soon. </li></ul></ul>

17.
Memory Hierarchy: Principles of Operation <ul><li>At Any Given Time, Data is Copied Between only 2 Adjacent Levels: </li></ul><ul><ul><li>Upper Level (Cache): The One Closer to the Processor </li></ul></ul><ul><ul><ul><li>Smaller, Faster, and Uses More Expensive Technology </li></ul></ul></ul><ul><ul><li>Lower Level (Memory): The One Further Away From the Processor </li></ul></ul><ul><ul><ul><li>Bigger, Slower, and Uses Less Expensive Technology </li></ul></ul></ul><ul><li>Block: </li></ul><ul><ul><li>The Minimum Unit of Information that can either be present or not present in the two level hierarchy </li></ul></ul>Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y

18.
Factors Affecting Effectiveness of Memory Hierarchy <ul><li>Hit: Data Appears in Some Blocks in the Upper Level </li></ul><ul><ul><li>Hit Rate: The Fraction of Memory Accesses Found in the Upper Level </li></ul></ul><ul><ul><li>Hit Time: Time to Access the Upper Level Which Consists of </li></ul></ul><ul><ul><li>(( RAM Access Time) + (Time to Determine Hit/ Miss)) </li></ul></ul><ul><li>Miss: Data Needs to be Retrieved From a Block at a Lower Level (Block Y) </li></ul><ul><ul><li>Miss Rate = 1 - (Hit Rate) </li></ul></ul><ul><ul><li>Miss Penalty: The Additional Time needed to Retrieve the Block from a Lower Level Memory After a Miss has Occurred </li></ul></ul><ul><li>In order to have effective memory heirarchy: </li></ul><ul><ul><li>Hit Rate >> Miss Rate </li></ul></ul><ul><ul><li>Hit Time << Miss Penalty </li></ul></ul>

29.
Cache Block <ul><li>Cache Block: The Cache Data That is Referenced by a Single Cache Tag </li></ul><ul><li>Our Previous “Extreme” Example: </li></ul><ul><ul><li>4- Byte Direct Mapped Cache: Block Size = 1 word </li></ul></ul><ul><ul><li>Take Advantage of Temporal Locality: If a Byte is Referenced, It will Tend to be referenced Soon </li></ul></ul><ul><ul><li>Did not Take Advantage of Spatial Locality: If a Byte is Referenced, Its Adjacent Bytes will be Referenced Soon </li></ul></ul><ul><li>In Order to Take Advantage of Spatial Locality: Increase Block Size (i.e., number of bytes in a block) </li></ul>

34.
How Memory Interleaving Works? <ul><li>Observation: Memory Access Time < Memory Cycle Time </li></ul><ul><ul><li>Memory Access Time: Time to send address and read request to memory </li></ul></ul><ul><ul><li>Memory Cycle Time: From the time CPU sends address to data available at CPU </li></ul></ul><ul><li>Memory Interleaving Divides the Memory Into Banks and Overlap the Memory Cycles of Accessing the Banks </li></ul>Access Time Memory Cycle Access Bank 0 Again

37.
Another “Extreme” Example <ul><li>Imagine a Cache: Size = 4 Bytes, Block Size = 4 Bytes </li></ul><ul><ul><li>Only ONE Entry in the Cache </li></ul></ul><ul><li>By Principle of Temporal Locality, It is True that If a Cache Block is Accessed Once, it will Likely be Accessed Again Soon. Therefore, This One Block Cache Should Work in Principle. </li></ul><ul><ul><li>But Reality is That It is Unlikely the Same Block will be Accessed Again Immediately! </li></ul></ul><ul><ul><li>Therefore, the Next Access will Likely to be a Miss Again </li></ul></ul><ul><ul><ul><li>Continually Loading Data Into the Cache But Discard (forced out) Them Before They are Used Again </li></ul></ul></ul><ul><ul><ul><li>Worst Nightmare of a Cache Designer: Ping Pong Effect </li></ul></ul></ul><ul><li>Conflict Misses are Misses Caused by: </li></ul><ul><ul><li>Different Memory Locations Mapped to the Same Cache Index </li></ul></ul><ul><ul><ul><li>Solution 1: Make the Cache Size Bigger </li></ul></ul></ul><ul><ul><ul><li>Solution 2: Multiple Entries for the Same Cache Index </li></ul></ul></ul>

40.
Implementation of Set Associative Cache <ul><li>N- Way Set Associative: N Entries for Each Cache Index </li></ul><ul><ul><li>N Direct Mapped Caches Operates in Parallel </li></ul></ul><ul><ul><li>Additional Logic to Examine the Tags to Decide Which Entry Is Accessed </li></ul></ul><ul><li>Example: Two- Way Set Associative Cache </li></ul><ul><ul><li>Cache Index Selects a “set” from the Cache </li></ul></ul><ul><ul><li>The Two Tags In the Set Are Compared In Parallel </li></ul></ul><ul><ul><li>Data Is Selected Based On the Tag Result </li></ul></ul>Set Entry #1 Entry #2 Tag #1 Tag #2 Tag Index

45.
A Summary on Sources of Cache Misses <ul><li>Compulsory (Cold Start, First Reference): First Access to a Block </li></ul><ul><ul><li>“ Cold” Fact of Life: Not a Whole Lot You Can Do About It </li></ul></ul><ul><li>Conflict (Collision): </li></ul><ul><ul><li>Multiple Memory Locations Mapped to Same Cache Location </li></ul></ul><ul><ul><li>Solution 1: Increase Cache Size </li></ul></ul><ul><ul><li>Solution 2: Increase Associativity </li></ul></ul><ul><li>Capacity: </li></ul><ul><ul><li>Cache Cannot Contain All Blocks Accessed By the Program </li></ul></ul><ul><ul><li>Solution: Increase Cache Size </li></ul></ul><ul><li>Invalidation: Other Process (e. g., I/ O) Updates Memory </li></ul><ul><ul><li>This occurs more often in multiprocessor system in which each processor has its own cache, and any processor updates a data in its own cache may invalidate copies of the data in other caches </li></ul></ul>

46.
Cache Replacement <ul><li>Issue: Since many memory blocks can go into a small number of cache blocks, when a new block is brought into the cache, an old block has to be thrown out to make room. Which block to be thrown out? </li></ul><ul><li>Direct Mapped Cache: </li></ul><ul><ul><li>Each Memory Location can only be Mapped to 1 Cache Location </li></ul></ul><ul><ul><li>No Need to Make Any Decision :-) </li></ul></ul><ul><ul><li>Current Item Replaced the Previous Item In that Cache Location </li></ul></ul><ul><li>N- Way Set Associative Cache: </li></ul><ul><ul><li>Each Memory Location Have a Choice of N Cache Locations </li></ul></ul><ul><ul><li>Need to Make a Decision on Which Block to Throw Out! </li></ul></ul><ul><li>Fully Associative Cache: </li></ul><ul><ul><li>Each Memory Location Can Be Placed in ANY Cache Location </li></ul></ul><ul><ul><li>Need to Make a Decision on Which Block to Throw Out! </li></ul></ul>

48.
Cache Write Policy: Write Through Versus Write Back <ul><li>Cache Read is Much Easier to Handle than Cache Write: </li></ul><ul><ul><li>Instruction Cache is Much Easier to Design than Data Cache </li></ul></ul><ul><li>Cache Write: </li></ul><ul><ul><li>How Do We Keep Data in the Cache and Memory Consistent? </li></ul></ul><ul><li>Two Options </li></ul><ul><ul><li>Write Back: Write to Cache Only. Write the Cache Block to Memory When that Cache Block is Being Replaced on a Cache Miss. </li></ul></ul><ul><ul><ul><li>Need a “Dirty” bit for Each Cache Block </li></ul></ul></ul><ul><ul><ul><li>Greatly Reduces the Memory Bandwidth Requirement </li></ul></ul></ul><ul><ul><ul><li>Control can be Complex </li></ul></ul></ul><ul><ul><li>Write Through: Write to Cache and Memory at the Same Time </li></ul></ul><ul><ul><ul><li>Isn’t Memory Too Slow for this? </li></ul></ul></ul><ul><ul><ul><li>Use a Write Buffer </li></ul></ul></ul>

50.
Write Buffer Saturation <ul><li>Store Frequency > 1 / DRAM Write Cycle </li></ul><ul><ul><li>If this condition exists for a long period of time (CPU cycle time too quick and / or too many store instructions in a row): </li></ul></ul><ul><ul><ul><li>Store buffer will overflow no matter how big you make it </li></ul></ul></ul><ul><ul><ul><li>The CPU Cycle Time << DRAM Write Cycle Time </li></ul></ul></ul><ul><li>Solution for Write Buffer Saturation </li></ul><ul><ul><li>Use a Write Back Cache </li></ul></ul><ul><ul><li>Install a Second Level (L2) Cache </li></ul></ul>

51.
Misses in Write Back Cache <ul><li>Upon Cache Misses, a Replaced Block Has to Be Written Back to Memory Before New Block Can Be Brought Into the Cache </li></ul><ul><li>Techniques to Reduce Panelty of Writing Back to Memory </li></ul><ul><ul><li>Write Back Buffer: </li></ul></ul><ul><ul><ul><li>The Replaced Block Is First Written to a Fast Buffer Rather Than Writing Back to Memory Directly </li></ul></ul></ul><ul><ul><li>Dirty Bit: </li></ul></ul><ul><ul><ul><li>Use Dirty Bit to Indicate If Any Changes Have Been Made to a Block. If the Block Has Not Been Changed, There Is No Need to Write It Back to Memory </li></ul></ul></ul><ul><ul><li>Sub-Block: </li></ul></ul><ul><ul><ul><li>A Unit within a block that has its own valid bit. When a miss occurs, only the bytes in that sub-block are brought in from memory </li></ul></ul></ul>