Here are the first set of specifications for AMD's next high-end GPU silicon, on which the company will no doubt carve out several SKUs from. Codenamed "Hawaii," and slated for unveiling on the 26th in, well, Hawaii, the 28 nm chip is what AMD will take NVIDIA's GK110 silicon head-on with. It is based on AMD's second-generation Graphics CoreNext micro-architecture.

With an estimated die-area of 430 mm² (18% bigger than "Tahiti,") the chip physically features 2,816 stream processors (SPs) spread across 44 clusters with 64 SPs each (a 37.5% increase over "Tahiti"). The chip features four independent raster engines, compared to two independent ones on "Tahiti." This could translate into double the geometry processing muscle as "Tahiti," with four independent tessellation units. The memory interface of the chip is expected to be 384-bit wide, based on the GDDR5 specification. Given the way TMUs are arranged on chips based on this architecture, one can deduce 176 TMUs on the chip. The ROP count could be 32 or 48. The chip will feature hardware support for DirectX 11.2, including the much hyped shared resources (mega-texture) feature.Source: 3DCenter.org

97 Comments on AMD "Hawaii" R9 290X GPU Specifications Revealed

Wow, 2816 SP was a lot more than I was expecting. I guess they weren't exaggerating about die area efficiency. Early specs had 2304 and the TPU GPU database (which seems to be ahead of the curve) only had 2560 predicted.

That means Hawaii is about 17% more die area efficient than Tahiti. Granted, most of the auxiliary logic like the memory controller and PCI express controller don't increase in size with the number of SP, but it's still a nice improvement.

Hopefully the 900MHz core clock means that the chips are power limited (like GK110) and can be clocked higher with proper power delivery and cooling, not that AMD is having problems with achieving higher clock speeds.

The Forbes interview indicated that they could achieve higher clock speeds at 28nm than 20nm, so that makes me wonder if either this 900mhz clock is too low or if the 20nm process is that bad. I tend to believe the latter.

37.5% increase in core count for an 22% (w/o packaging. 17.8% as package) increase in die size...on the same process node?
Even if AMD managed a transistor density ~Pitcairn, that seems a little out of balance.

by: The Von MatricesWow, 2816 SP was a lot more than I was expecting. I guess they weren't exaggerating about die area efficiency. Early specs had 2304 and the TPU GPU database (which seems to be ahead of the curve) only had 2560 predicted.
Just for comparison:
Tahiti has 2048 SP in 365mm^2 -> 5.61 SP/mm^2
Hawaii has 2816 SP in 430mm^2 -> 6.55 SP/mm^2

That means Hawaii is about 17% more die area efficient than Tahiti. Granted, most of the auxiliary logic like the memory controller and PCI express controller don't increase in size with the number of SP, but it's still a nice improvement.

Kind of puts Pitcairn's 6.04 SP/mm^2 to shame if that is indeed the case.

by: The Von Matrices The Forbes interview indicated that they could achieve higher clock speeds at 28nm than 20nm, so that makes me wonder if either this 900mhz clock is too low or if the 20nm process is that bad. I tend to believe the latter.

Might be a case of transistor density translating into higher localised heat (i.e. the Ivy Bridge and Haswell effect)

by: HumanSmoke37.5% increase in core count for an 22% (w/o packaging. 17.8% as package) increase in die size...on the same process node?
Even if AMD managed a transistor density ~Pitcairn, that seems a little out of balance.

Kind of puts Pitcairn's 6.04 SP/mm^2 to shame if that is indeed the case.

The SP section of the Hawaii die may have the same density as Pitcairn; it's just that since the size of the other logic on the die is relatively fixed, as you add more SP then the efficiency increases.

by: The Von MatricesThe SP section of the Hawaii die may have the same density as Pitcairn; it's just that since the size of the other logic on the die is relatively fixed, as you add more SP then the efficiency increases.

Wouldn't the uncore ( memory controllers and GDDR interface, I/O, thread dispatch etc) be more in line with Tahiti than Pitcairn- the latter after all requiring a smaller amount of die real estate from its 256-bit bus width? AFAIW, memory control/interface still makes up the largest part of the uncore

by: HumanSmokeWouldn't the uncore ( memory controllers and GDDR interface, I/O, thread dispatch etc) be more in line with Tahiti than Pitcairn- the latter after all requiring a smaller amount of die real estate from its 256-bit bus width? AFAIW, memory control/interface still makes up the largest part of the uncore

You're right - I just did the math and my original statement couldn't be correct (that most of the increase in area efficiency could just be due to the increased number of SP). If I assume that the die area occupied by everything other than the SPs is identical between Hawaii and Tahiti, then a set of linear equations can be formed:

2816x + y = 430
2048x + y = 365

where x is the die area per SP and y is the die area occupied by everything other than the SPs.

This works out to be y (uncore die area) = 192 mm^2 and x (die area per SP) = 0.0846 mm^2. This would mean that if Tahiti's SP's were identical to the ones in Hawaii then Tahiti would be 52% uncore. But just looking at die pictures of Tahiti shows that this is obviously untrue.

So some serious optimizations have to had been done. Credit is due to AMD for making such a die area efficiency gain without reducing the process size. Where performance lines up is yet to be told though.

EDIT: this increase in efficiency not be true, see my post below

by: erockerI'd love to see a pre-release (sale) review... and a price. Though, if I'm not mistaken I thought I heard something that they'll be under $600. So that probably means $599 but you never know.

The R9 290X is clocked 16% lower than the 7970 GHz Edition but has 37.5% more SP. If the performance of Hawaii scales linearly with SP and clock speed with respect to Tahiti, then based on the TPU performance chart the R9 290X should be 97% as fast as GTX 780. If a dynamic boost mode is implemented (ala 7790) as has been rumored then they should be neck and neck in performance, possibly with the R9 290X being a little faster if it can boost over 1GHz consistently. However, as with any product the price is what will really determine a winner. But even if the R9 290X is priced the same as GTX 780, the R9 290X should be the better buy.

by: The Von MatricesThe R9 290X is clocked 16% lower than the 7970 GHz Edition but has 37.5% more SP. If the performance of Hawaii scales linearly with SPs and clock speed with respect to Tahiti, then based on the TPU performance chart the R9 290X should be 97% as fast as GTX 780. If a dynamic boost mode is implemented (ala 7790) as has been rumored then they should be neck and neck in performance, possibly with the R9 290X being a little faster if it can boost over 1GHz consistently.

by: The Von Matrices However, the price is what will really determine a winner.

Hopefully:
1. AMD and Nvidia don't reach some kind of unwritten (and non verifiable) understanding, and
2. Yields and production ramp is sufficient to make 1. unnecessary.

BTW: I Googled the Tahiti die shot image you posted, and a poster at B3D (fellix) cleaned up the image (or one very similar) which should make calculations easier....assuming of course that there isn't a radical difference in architecture which I think would be unlikely

Hopefully:
1. AMD and Nvidia don't reach some kind of unwritten (and non verifiable) understanding, and
2. Yields and production ramp is sufficient to make 1. unnecessary.

BTW: I Googled the Tahiti die shot image you posted, and a poster at B3D (fellix) cleaned up the image (or one very similar) which should make calculations easier....assuming of course that there isn't a radical difference in architecture which I think would be unlikely

Thanks for that - it makes things a lot clearer. By some rough cropping I just calculated by pixels that since the entire die in that image is 1283058 pixels and the shaders are 622400 pixels. That brings up a startling revelation. That would mean the non-shaders is 51.5% of the die, which is exactly in line what what I said earlier. So maybe what AMD did is just tacked on 768 more shaders to Tahiti without any major reworking and called it a day. The only way the that die efficiency could possibly increase is if there were 48 ROPs squeezed onto the Hawaii die instead of 32 in Tahiti, which would mean everything else would have to be smaller.

by: hardcore_gamerI don't think AMD will release a flagship card that is slower than 780. If they can't beat their competitor's previous generation card, it will be a fail.

It all depends on the boost algorithm and how it is implemented. As I said, if it can consistently boost past 1GHz then it should be faster than GTX 780 (by about 10% in that case).

by: RophIt's a newer GCN revision too though, so just comparing numbers of shaders isn't good enough to judge performance.

If anything, the real estate taken up by the cores (and by extension the Compute Units) should either be relatively static, or increase for a given node where performance increases.

What precisely would AMD cull to decrease size? Cache ? Vector and texture units ? the number of load/store units ?
The other alternatives are AMD have made some breakthrough in transistor density...or Tahiti was a badly laid out chip.

This gets my hopes up for something on par with the 780, if so and it doesnt costs stupid amounts I'm going to upgrade form my HD 6950 2GB that I bought on release. I missed to 7000 series due to the huge bump in price at release (much more sensible now at £270 for an HD 7970 GHZ) so lets hope they undercut Nvidia with the same performance.

by: RophIt's a newer GCN revision too though, so just comparing numbers of shaders isn't good enough to judge performance.

Remember, 7790 was a newer revision of GCN as well. But that meant nothing in terms of increased performance. To quote Anandtech on the 7790

In this new microarchitecture there are some changes – among other things the new microarchitecture implements some new instructions that will be useful for HSA, support for a larger number of compute work queues (also good for HSA) and it also implements a new version of AMD’s PowerTune technology (which we’ll get to in a bit) – but otherwise the differences from Southern Islands are very few. There are no notable changes in shader/CU efficiency, ROP efficiency, graphics features, etc. Unless you’re writing compute code for AMD GPUs, from what we know about this microarchitecture it’s likely you’d never notice a difference.

So if this is the same revised GCN architecture as the 7790, which it probably is, I wouldn't expect anything noticeable in efficiency improvements. The biggest improvement would be the boost algorithm, which is what would put this card over the GTX 780's performance instead of matching it.