Looking back at two years of Graphics Core Next

ATI, later AMD endured several major evolutions in its graphics architecture which mirrored industry developments and major revolutions in the PC and graphics industry. At times Radeon graphics was used by Microsoft for the development of their DirectX APU. All roads lead to GCN, which encompasses AMD's vision for a graphics core that can scale top to bottom, for high end gaming graphics cards, consoles and tablets.

Fixed Function

3dfx VooDoo Graphics series

ATI Rage series

ATI Radeon 7000 series

MATROX G-Series

NVIDIA RIVA 128

NVIDIA TNT, TNT 2

NVIDIA GeForce, GeForce 2

Simple Shaders

ATI Radeon 8000-9000 series

ATI Radeon X1000 series

MATROX P-Series

NVIDIA GeForce 3 - 7

Graphics Parallel Core

ATI/AMD Radeon HD 2000-8000 series

AMD Radeon R7 series

AMD Radeon R9 series

NVIDIA GeForce 8000-9000 series

NVIDIA GeForce 200-700 series

The design goals for the original GCN at launch sound remarkably familiar to those for R9, with the exception being heterogeneous computing which is still somewhat undeveloped and Fusion which is now rebranded as HSA.

Tahiti resembles a 'stack of Lego bricks' which, Compared to Hawaii

Hawaii's GCN '2.0's hardware resources are organised into is organised into four units called "Shader Engines" which allows resources to be scaled and shared more effectively. This effectively mirror's NVIDIAs approach with Kepler expect their name for the topology is SMX units.

This allows for GPUs to be scaled down more easily by disabling an entire Shader engine (or SMX units) without re-spinning the entire chip to reduce the number of cores or fusing off clusters of cores. There is still some resource sharing within resources contained in each shader engine such as renderers and caches.

Each Shader Engine contains1 rasteriser and 1 Geometry Unit which can load balance, 1 Shader Engine is sufficient to operate the entire GPU.

Geometry is setup and tessellated in The Geometry processors. Data can be exchanged with the compute units if needed or sent to Rasteriser directly.

The compute units execute pixel shaders or perform GPU computing on the scene

Pixel data is then passed onto the rasterisers which handle assignment or partitioning of pixels on the screen as well as Hierarchal Z sorting, i.e. the pixels depth in the scene

Finally, the Render Back Ends handle Pixel Depth Testing as well as stencilling and colour operations

Further to do actual processing, Each of Hawaii's Shader Engines contain 11 Compute Units. The Compute Unit is the smallest physical processing block of the GPU containing all of the necessary low level building blocks that a compute processor needs to fetch, decode and execute instructions.

The final stage are the Render Back Ends which handle operations relating to the scene's Z(Depth), Stenciling and Color.

That is all the graphics and compute processing pipelines explained, but a GPU many processors in parallel, which need to be fed tasks and be directed.

We need a means of scheduling and dispatching to allow the GPU to perform multi-tasking across its parallel computing units. This is where the Asynchronous Compute Units come in, which Hawaii has 8 of which are independent of the Shader engines. The ACE units queue, store and share data for use in GPU computing across the entire GPU. Graphics specific commands are issued by a separate command unit.

So in summary the layout of Graphics Core Next Architecture, 'version 2' as used in the 290X is essentially a scaled up version of Tahiti.

In addition to the increased GPU resources, Hawaii adds updating display controllers for Eyefinity, AMD TrueAudio and a new version of CrossFire.

GCN v Kepler Architecture Performance & Efficiency – spec comparison

AMD Radeon HD 7970 GHz Edition ‘Tahiti’

AMD Radeon R9 290X ‘Hawaii’

Increase

NVIDIA GeForce GTX 780 ‘Kepler’

NVIDIA GeForce GTX TITAN

‘Kepler’

Geometry Processing

2.1 billion primitives/sec

4 billion primitives/sec

1.9x

Compute

4.3 TFLOPS

5.6 TFLOPS

1.3x

4.0 TFLOPS

4.5 TFLOPS

Texture Fill Rate

134.4 Gtexels/sec

176 Gtexels/sec

1.3x

166 Gtexels/sec

188 Gtexels/sec

Pixel Fill Rate

33.6 Gpixels/sec

64 Gpixels/sec

1.9x

41.4 Gpixels/sec

40.2 Gpixels/sec

Peak Bandwidth

264 GB/sec

320GB/sec

1.2x

288 GB/sec

288 GB/sec

Die Area

352 mm^2

438 mm^2

1.24x

561 mm^2

561 mm^2

Peak GFLOPS/mm^2

12.2

12.8

1.05x

7.1

8

While Peak raw power and computing have not significantly increased, the GPU’s horsepower within its engines is much stronger with almost 2x throughput available for 3D Graphics intensive tasks such as pixel shaders and geometry at only a 25% increase in die size. On paper 290X is more efficient, thanks to its ‘higher horsepower’ design at a smaller die size than the Kepler GK110 based GTX 780.

On paper, 290X provides a good step-up from the previous generation HD 7970. The lower compute performance for NVIDIA GeForce is expected as this is a hallmark of their consumer oriented GPU.