The graphics architecture that is Kepler

As you can understand, the massive memory partitions, bus-width and combination of GDDR5 memory (quad data rate) allow the GPU to work with a very high framebuffer bandwidth (effective). Let's again put most of the data in a chart to get an idea and better overview of changes:

Graphics card

GeForce GTX660

GeForce GTX660 Ti

GeForce GTX670

GeForce GTX 680

GeForce GTX 690

Fabrication node

28nm

28nm

28nm

28nm

28nm

Shader processors

960

1344

1344

1536

3072

Streaming Multiprocessors (SM)

5

7

7

8

16

Texture Units

80

112

112

128

128x2

ROP units

24

24

32

32

32x2

Graphics Clock (Core)

980/1033 MHz

915 / 980MHz

915 / 980MHz

1006/1058MHz

915/1019MHz

Shader Processor Clock

980/1033 MHz

915 / 980MHz

915 / 980MHz

1006/1058MHz

915/1019MHz

Memory Clock / Data rate MHz

1502 / 6008 MHz

1502 / 6008 MHz

1502 / 6008 MHz

1502 / 6008 MHz

1502 / 6008 MHz

Graphics memory

2048 MB

2048 MB

2048 MB

2048 MB

4096 MB

Memory interface

192-bit

192-bit

256-bit

256-bit

256-bit

Memory bandwidth

144 GB/s

144 GB/s

192 GB/s

192 GB/s

192 GB/s

Power connectors

1x6-pin PEG

2x6-pin PEG

2x6-pin PEG

2x6-pin PEG

2x8-pin PEG

Max board power (TDP)

140 Watts

150 Watts

170 Watts

170 Watts

300 Watts

Recommended Power supply

450 Watts

450 Watts

500 Watts

550 Watts

750 Watts

GPU Thermal Threshold

98 degrees C

98 degrees C

98 degrees C

98 degrees C

98 degrees C

So we talked about the core clocks, specifications and memory partitions. Obviously there's a lot more to talk through the GPU architecture for example. To understand a graphics processor you simply need to break it down into pieces to better understand it.

Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Kepler architecture, if you're not interested in g33k talk by all means please browse to the next page.

So above we see the GK106 block diagram that entails the Kepler architecture. Let's break it down into bits and pieces.

A fully operating GK106 will have:

960 CUDA processors (Shader cores)

192 CUDA core clusters (per SM).

5 geometry units

3 raster Units

80 Texture Units

24 ROP engines

192-bit GDDR5 memory bus

DirectX 11.1

Above thus a fully operating GK106 as used on the GTX 660. So the more important thing to focus on are the SM (block of shader processors) clusters (or SMX as NVIDIA likes to call it for the GTX 660, which has 192 Shader processors. That's radically different from Fermi, the GeForce GTX 580 for example had 32 shader processors per SM cluster. 960 : 192 = 5 Shader clusters (SMs). Let's blow up one such cluster:

Above the block diagram for a single Shader processor cluster, aka SM or SMX as NVIDIA now calls it. The SMX has quite a bit more bite in terms of shader, texture and geometry processing. 192 CUDA cores, that's six times the number of cores per SM opposed to Fermi. Now, at the end of the pipeline we run into the ROP (Raster Operation) engine and the GTX 660 again has 24 engines for features like pixel blending and AA, the GTX 660 Ti has 24 of these activated.

There's a total of 80 texture filtering units available for the GK106. The math is simple here, each SM has 16 texture units tied to it.

GeForce GTX 580 has 16 SMs X 4 Texture units = 64

GeForce GTX 660 Ti has 5 SMs X 16 Texture units = 80

GeForce GTX 660 Ti has 7 SMs X 16 Texture units = 112

GeForce GTX 670 has 7 SMs X 16 Texture units = 112

GeForce GTX 680 has 8 SMs X 16 Texture units = 128

Above the GK105 host interface - The Gigathread engine, three GPCs, three memory controllers, the ROP partitions, a 384 KB L2 cache. ROP partitions are nearby to the L2 cache, Each shader cluster then is tied to L1 and a shared L2 cache. Shading performance is going be increased quite bit, geometry performance will get a nice boost as well.

Gigabyte X399 Aorus Gaming 7 reviewRecently AMD launched Ryzen Threadripper processors, and a new chipset to support these mega-core processors. In this review we take the Gigabyte X399 Aorus Gaming 7 for a spin....

Gigabyte Aorus AX370 Gaming K5 reviewIn this review we check out the Aorus AX370 Gaming K5 from Gigabyte. This motherboard is the slightly more affordable one over the K5 nice and has dark looks. I might be a lovely match for any Ryzen...

Gigabyte X299 Aorus Gaming 3 motherboard review We review the Gigabyte X299 Aorus Gaming 3 motherboard. A nice motherboard in dark accents that sits in a bit more wallet friendly segment. This X299 motherboard can house Kaby-Lake-X and Skylake-X pr...

Gigabyte X299 Aorus Gaming 3 motherboard previewWe preview the Gigabyte X299 Aorus Gaming 3 which is intended for Intels Skylake-X processors that will be released this summer based on Socket LGA2066 however the motherboard also supports Kaby Lake...