Multicore 1

Multi-core systems
System Architecture COMP25212
Daniel Goodman
Advanced Processor Technologies Group
Multi-Cores are Coming (here?)
 Many processors in normal desktops/laptops are ‘dual
core’ or ‘quad core’
•
•
•
•
•
What does this mean?
Why is it happening?
How are they different?
Where are they going?
Do they change anything?
Moore’s Law
45nm Fun Facts



A human hair= 90000nm
Bacteria = 2000nm
Silicon atom = 0.24nm
The need for Multi-Core
 For over 30 years the performance of processors has
doubled every 2 years
 Driven mainly by shrinkage of circuits
 Smaller circuits
• more transistors per chip
• shorter connections
• lower capacitance
 Smaller circuits go faster
 In early 2000s the rate started to decrease
Motivation
Is cooling a problem?
Intel Nehalem: In the
event of all the cores not
being used, the unused
cores can be shutdown
allowing the remaining
cores to use the spare
resources and speed
up.
The Memory Wall
Memory Speed is failing to keep up with processor speed. Why?
 Processor utilization (15%-25%)
The End of “Good Times”
 Slowdown for several reasons
• Power density increasing (more watts per unit area) cooling is a serious problem
• Small transistors have less predictable characteristics
• Architectural innovation hitting design complexity problems
(limited ILP)
• Memory does not get faster at the same rate as processors
A solution is replication
 Put multiple CPUs (cores) on a single integrated circuit
(chip)
 Use them in parallel to achieve higher performance
 Simpler to design than a more complex single
processor
 Need more computing power – just add more cores?
How to Connect Them?
 Could have independent processor/store pairs with
interconnection network
 At the software level the majority of opinion is that
shared memory is the right answer for a general
purpose processor
 But, when we consider more than a few cores, shared
memory becomes more difficult to implement
Can We Use Multiple Cores?
 Small numbers of cores can be used for separate tasks
– e.g. run a virus checker on one core and Word on
another
 If we want increased performance on a single
application we need to move to parallel programming
 General purpose parallel programming is known to be
hard – consensus is that new approaches are needed
There Are Problems
 We don’t know how to engineer extensible memory
systems
 We don’t know how to write general purpose parallel
programs
 If we develop new approaches to parallel
programming do they fit with existing serial processor
designs?
Intel Core i7 (Nehalem)
2 Simultaneous Multi-Threading per core
Traditional Structure – "Historical View”
(Processor, Front Side Bus, North Bridge, South Bridge)
Processor and Cache
(single die/chip SRAM)
Front Side Bus
Graphics Card
North Bridge Chip
Memory Controller
Main Memory
(DRAM)
South Bridge Chip
Motherboard
…
I/O Buses (PCIe, USB, Ethernet, SATA HD)
core
L1 Inst L1 Data
L2 Cache
core
L1 Inst L1 Data
L2 Cache
L3 Shared Cache
Memory Controller
Typical Multi-core Structure
Main Memory
(DRAM)
On Chip
QPI or HT
PCIe
Input/Output Hub
PCIe
Graphics Card
Input/Output Controller
Motherboard
…
I/O Buses (PCIe, USB, Ethernet, SATA HD)
Simplified Multi-Core Structure
core
core
core
core
L1
L1
Inst Data
L1
L1
Inst Data
L1
L1
Inst Data
L1
L1
Inst Data
Shared Bus
Level 2 Cache
On Chip
Main Memory
Nehalem Caches



Private L1: split D$ & I$, 32KB each, 4-way I$ & 8-way set associative, approx. LRU,
block size 64 bytes, write-back & write-allocate
Private L2: 8-way set associative, idem.
Shared L3: 16-way set associative, idem
Cache Coherence?
Summary
 Multi-core systems are here to stay
• Physical limitations
• Design costs
 The industry did not want to come but there is no
current alternative
 One of the biggest changes for our field
• General Purpose Parallel Programming must be made
tractable
 For further reading Patterson and Hennessy 4th Edition
Chapter 1