Yes, its possible to air cool 300W just fine on a PCIe slot form factor.
Yes, WC might do it with less surface area and with a lower maximum temp under load, at a higher build cost and higher failrate.

The reason you see WC on a 1070 was because AIOs are all the rage for some and the midrange can be sold at premium. They sell. The reason you saw it on the Fury X was because AMD considered it the best option given the tradeoff to an air cooler (bulky/weight wise, even despite the higher cost) - so that dóes indicate that WC was related to high power consumption. Then, with Radeon VII they showed us that they can do the air option just fine as well with a large triplefan setup- and its likely cost played a major role here because the margin on that card isn't great.

The reason we don't see WC on low-end is because there is no way you will sell budget cards at premium price.

Not to sound overly snide, but you do know there are wattages in between 50 and 275, right? As mentioned above, there have been water cooled GTX 1070 cards (150W), there are plenty of water cooled RTX 2070 cards (175W). In other words, partner cards with AIOs are in no way necessarily proof of high power consumption, just that the cards is in a high enough price bracket where "premium cooling" allows AIB partners to demand premium pricing.

And as @Vayra86 pointed out above: low-end cards don't sell if they're too expensive. Sticking a $70 AIO on a $200 RX 580 doesn't make sense, but it does so on a $500 RTX 2070, even though they're roughly the same wattage, as the cost of the cooler would then represent a much smaller percentage of the total price, and that market segment is generally more open to "premium cooling".

So there is a God written rule that they need to name it in a specific way ? GCN 5 is drastically different from GCN 1 in pretty much every way, they are worlds apart both in feature set and microarthitectural differences that change the clocks/power etc. It's a label that they may chose to keep using or not, it doesn't mean anything in particular if they do.

No, but it does make sense to not change the fundamentals of a chip architecture and keep the same name - that would be very confusing for everyone involved, particularly the people writing drivers for the hardware. And as pointed out above, GCN has not been fundamentally changed since its inception, it has been iterated upon, tweaked and upgraded, expanded, had features added - but the base architecture is still roughly the same, and works within the same framework - unlike, say, Nvidia's transition from Kepler to Maxwell, where driver compatibility fell off a cliff due to major architectural differences.

Not to sound overly snide, but you do know there are wattages in between 50 and 275, right? As mentioned above, there have been water cooled GTX 1070 cards (150W), there are plenty of water cooled RTX 2070 cards (175W). In other words, partner cards with AIOs are in no way necessarily proof of high power consumption, just that the cards is in a high enough price bracket where "premium cooling" allows AIB partners to demand premium pricing.

I'm guessing we could spin this a million different ways.
What I know right now is:
1. Cards with water cooling sport above average power draw.
2. Till now GCN didn't do TBR, so it had power draw well above Nvidia's.
3. The first glimpse we have at Navi is apparently water cooled.

People keep hoping for a GPU's Zen, I keep seeing Bulldozer iterations...

New Member

Uhm, no. It's a core architecture, which AMD has iterated on since they abandoned the previous TeraScale architecture. There are many variants, but they share a core framework and a lot of functionality. No GCN variant is fundamentally different from any other - just improved upon, added features to, etc. That's why AMD's early GCN cards have had such excellent longevity.

It's more of an ISA than "core architecture". GCN is a quad-SIMD design utilizing 1 instruction for the set, usually tasked in 64-thread groups. AMD's "next-gen" architecture still looks similar to GCN and is even executed similarly to current ISA, but has moved to VLIW2 (Super SIMD) and has drastically reworked CU clusters and caches. It probably won't be called GCN though, simply because AMD wants to retire that nomenclature. Vega was the largest change to GCN to date. Previously, ROPs used their own local cache, but the new tiling rasterizers need the ROPs connected to L2 to keep track of occluded primitives within pixels to cull them and reuse data for immediate mode tiling (hybrid raster). 2xFP16 is also useful in certain scenarios.

Vega and Turing both have new small geometry shaders that replace certain tessellation stages. In Vega, they're called primitive shaders, and in Turing, simply, mesh shaders. AMD is waiting for standardization in major APIs, while Nvidia seems fine with using a proprietary API extension to call them. Both types will further speed small geometry creation to enhance game realism, while AMD can also use them to speed geometry culling using their shader arrays to help their geometry front-ends.

Nvidia's basic GPC design (mini-GPUs within an interconnect fabric) dates back to Fermi, although Kepler fixed many of Fermi's shortcomings, Maxwell was the one to really propel it forward in perf/watt and not just from moving to immediate mode tiling rasterizers. Nvidia has also iterated on their GPC architecture, but in a much more aggressive manner (it helps to have a large R&D budget). Turing is still a VLIW2 GPC design*, using up to 6 GPCs in TU102. 7nm can extend that up to 8 GPCs when Nvidia moves to Ampere, but with RT taking priority now, Nvidia may just dedicate more die space to accelerating BVH traversal and intersection, trying to reduce ray tracing's very random hits to VRAM, and of course, making hybrid rendering, as a whole, more efficient and performant.

But, both AMD's GCN (2011) and Nvidia's GPC (2010) designs have been around for quite some time.

* Turing has to execute 2 SMs concurrently due to INT32 taking up 64 of 128 cores within an SM. So, using 2 SMs, 128 FP32/CUDA cores are tasked (warp is still 32 threads), similarly to Pascal and prior and thereby retains compatibility.

Polaris is prime example when AMD tried to clock a GPU way past it's efficiency curve.
The original RX400 series were okay on performance / watt, but after 1060 released AMD try to get that little bit of performance for a rather large TDP increase with their RX580 refresh.

Polaris is prime example when AMD tried to clock an GPU way past it's efficiency curve.
The original RX400 series were decent on performance / watt, but after 1060 released AMD try to get that little bit of performance for a rather large TDP increase with their RX580 refresh.

Vega is efficient, it just not fast as NV's parts so they had to compensate over it with Clock speed and thus got out of the efficiency curve - same issue with Intel's parts that pushing clocks towards 5Ghz at "95" TDP parts with actual power draw of 150W+.

Vega is as well, but Vega was designed to reach a higher clock speed than polaris in the first place.
Therefore it (at least for Vega 56) wasn't as far off the efficiency curve as Polaris ended up.
But you do see the same crazy power draw happening with the AIO version of Vega64, that performance / watt dropped off a cliff.

It's more of an ISA than "core architecture". GCN is a quad-SIMD design utilizing 1 instruction for the set, usually tasked in 64-thread groups. AMD's "next-gen" architecture still looks similar to GCN and is even executed similarly to current ISA, but has moved to VLIW2 (Super SIMD) and has drastically reworked CU clusters and caches. It probably won't be called GCN though, simply because AMD wants to retire that nomenclature. Vega was the largest change to GCN to date. Previously, ROPs used their own local cache, but the new tiling rasterizers need the ROPs connected to L2 to keep track of occluded primitives within pixels to cull them and reuse data for immediate mode tiling (hybrid raster). 2xFP16 is also useful in certain scenarios.

Vega and Turing both have new small geometry shaders that replace certain tessellation stages. In Vega, they're called primitive shaders, and in Turing, simply, mesh shaders. AMD is waiting for standardization in major APIs, while Nvidia seems fine with using a proprietary API extension to call them. Both types will further speed small geometry creation to enhance game realism, while AMD can also use them to speed geometry culling using their shader arrays to help their geometry front-ends.

Nvidia's basic GPC design (mini-GPUs within an interconnect fabric) dates back to Fermi, although Kepler fixed many of Fermi's shortcomings, Maxwell was the one to really propel it forward in perf/watt and not just from moving to immediate mode tiling rasterizers. Nvidia has also iterated on their GPC architecture, but in a much more aggressive manner (it helps to have a large R&D budget). Turing is still a VLIW2 GPC design*, using up to 6 GPCs in TU102. 7nm can extend that up to 8 GPCs when Nvidia moves to Ampere, but with RT taking priority now, Nvidia may just dedicate more die space to accelerating BVH traversal and intersection, trying to reduce ray tracing's very random hits to VRAM, and of course, making hybrid rendering, as a whole, more efficient and performant.

But, both AMD's GCN (2011) and Nvidia's GPC (2010) designs have been around for quite some time.

* Turing has to execute 2 SMs concurrently due to INT32 taking up 64 of 128 cores within an SM. So, using 2 SMs, 128 FP32/CUDA cores are tasked (warp is still 32 threads), similarly to Pascal and prior and thereby retains compatibility.

And you know Polaris or Vega's actual efficiency curve? You can't really say It was the clocks RX470 or RX480 had, because If you underclocked those chips, then they would have most likely better performance/power ratio than at their default clocks. Then I could also claim they are past their efficiency curve at their default clocks.

BTW comparing Polaris to Vega is unfair to begin with. Vega has more efficient HBM2 memory, but is also a more powerful gpu. Vega 64(4096SP, 256TMU, 64ROPs) has 10215-12665 GFLOPs vs RX570(2048SP, 128TMU, 32ROPs)which has 4784-5095 GFLOPs. Vega 64 is 114-149% more powerful on paper, but in reality It's only 97.5% faster than RX 570 in 4K resolution.
If we really wanted to compare which one is more efficient, we would need to have a 32-36CU version of Vega without HBM2.

The $399 Navi "Pro" is probably being designed with a performance target somewhere between the RTX 2060 and RTX 2070, so you typically pay $50 more than you would for an RTX 2060, for noticeably higher performance.

come on,let's not pretend that 90% of such channels cater for anything more than one or the other fanbases exclusively."this video is nothing new from what you've already seen a hundred times" doesn't earn clicks.look at pcgh test above.or the one that computerbase.de recently updated too.
worthless videos.but you go ahead and believe what they tell you.and don't forget to like and subscribe

they're gonna have to throw in one hell of a game bundle for people to defend this.

BTW comparing Polaris to Vega is unfair to begin with. Vega has more efficient HBM2 memory, but is also a more powerful gpu. Vega 64(4096SP, 256TMU, 64ROPs) has 10215-12665 GFLOPs vs RX570(2048SP, 128TMU, 32ROPs)which has 4784-5095 GFLOPs. Vega 64 is 114-149% more powerful on paper, but in reality It's only 97.5% faster than RX 570 in 4K resolution.
If we really wanted to compare which one is more efficient, we would need to have a 32-36CU version of Vega without HBM2.

As I've been saying for a while now, AMD got stuck with a rather serious problem when they maxed out the CU config of GCN with Fiji - it was competitive at the time, but left zero room to grow by adding CUs, so further improvements required pushing clocks past their sweet spot (in the mean time, Nvidia has increased their CUDA core count by a whopping 55% at the high end). Which gave us Vega. Not a bad arch update or bad GPUs overall, but they delivered a rather poor efficiency improvement considering the move from 28nm to 14nm - again, because GCN stopped AMD from adding more CUs, forcing them to squeeze as high clocks as possible from the chips. Not to mention that this made it look like they were chasing Nvidia's clock speeds for no good reason, while both failing at matching them and losing efficiency. A bit of a pile-up of bad consequences of an inherent architectural trait, sadly. I would imagine an 80-CU Vega at ~1200MHz would perform amazingly, and do a decent job at perf/W too. If AMD matched Nvidia's core count increase since 2015 (980 Ti/Fury X) we'd now have a 100CU/6400SP Vega card - which it's not hard to imagine would compete quite well with Nvidia's top end cards even at low clocks and on 14nm. The die would be large, just like the Fury X, so a compromise around 80-90 CUs and clocks in the 1300-1450MHz range might be better, but all in all, AMD is being bottlenecked by being incapable of widening their chip designs, and this is what has truly been holding them back since 2015. Fingers crossed that the NG arch takes this into account by allowing ~unlimited core count scaling.

It wasn't up until very recently that AMD could make a 64 CU GPU in under 500 mm^2. People weren't exactly thrilled with Vega as it was , making it even more expensive would have served no purpose. AMD's performance problemes can't and shouldn't be solved by adding more CUs. Besides I don't even think they could have even made such a GPU feasible with GloFo's 14nm node and TSMC's 7nm probably doesn't allow for huge dies at moment either.

I've seen this claim over and over again, but has it been explicitly stated from AMD that GCN can't do more than 64 CUs/4096 SPs?
To my knowledge there is no architectural reason why it wouldn't be possible, but there is a very good reason why they don't do it; adding e.g. 50% SPs would increase the energy consumption by ~50% but only increase the performance by ~20-30% at best, because a GPU with more clusters would need more powerful scheduling, and to maintain higher efficiency than the predecessor it would require more than 50% better scheduling. The problem for GCN have always been management of resources, and this is the reason why GCN has fallen behind Nvidia. GCN have plenty of computational power, just not the means to harness it.

I've seen this claim over and over again, but has it been explicitly stated from AMD that GCN can't do more than 64 CUs/4096 SPs?
To my knowledge there is no architectural reason why it wouldn't be possible, but there is a very good reason why they don't do it; adding e.g. 50% SPs would increase the energy consumption by ~50% but only increase the performance by ~20-30% at best, because a GPU with more clusters would need more powerful scheduling, and to maintain higher efficiency than the predecessor it would require more than 50% better scheduling. The problem for GCN have always been management of resources, and this is the reason why GCN has fallen behind Nvidia. GCN have plenty of computational power, just not the means to harness it.

They haven't confirmed it, no (why would they? That'd be pretty much the same as saying "we can't compete in the high end until our next arch, no matter what! - and that's bad business strategy), but three subsequent generations with ~the same specs save for clocks, cache and other minor tweaks (in terms of real-world performance) does tell us something. What you're saying is not an argument for not scaling out the die - after all, pushing clocks puts just as much demand on scheduling as making a wider design. It might of course be that AMD would gain more by "rebalancing" their architecture by increasing the number of other components than SPs alone, but that's besides the point. Also, Nvidia has demonstrated pretty well that your scaling numbers are on the pessimistic side. With a similar node shrink (28nm to 12nm) and two small-to-medium architecture updates they've increased the CUDA core count by 55%, increased clocks by ~60% (at least, depending on whether you look at real-world boost or not), and kept power draw at the same level. Of course there are highly complex technical reasons for why this works for Nvidia and not AMD, but claiming that AMD has deliberately chosen not to increase their CU count while their main competitor has increased theirs by 55% - and at the same time run off with the high-end GPU segment - sounds a bit like wishful thinking.

With all this being said, my Fury X is getting long enough in the tooth that I might still get one of these if they match or beat the 2070. But I'd really like for Arcturus(?) to arrive sooner rather than later.

…but claiming that AMD has deliberately chosen not to increase their CU count while their main competitor has increased theirs by 55% - and at the same time run off with the high-end GPU segment - sounds a bit like wishful thinking.

The fact remains that AMD have plenty of computational performance, while Nvidia manages to squeeze more gaming performance out of less theoretical performance, because AMD chose to focus on "brute force" performance rather than efficiency.