The only thing you've brought up so far is plain old process maturity (which should have benefited GCN just as much.)

What else do you have?

Click to expand...

WHo could have benefited GCN aswell.... dont go so much on the margin.... what have work so good with Maxwell, could have not work at all with GCN .. Both architecture are deeply different... One is aimed at full independant granularity and on parallel computing, threading. one is extremely simplified ( Maxwell ) one is like ARM techno and one is a bomber in term of capacity.. Maxwell have half the capacity of GCN3... but it run well with games who just ask this ( a lessoon from Tegra ? ) ..

Is GCN is under used ? yes... But somewhat it have allways been the case for ATI, AMD,.... their architectures are never used at their full potential.

I just hope that next GCN will not take this road .. as it is really limited in term of innovation..

Maxwell is really not a good example for Nvidia.. ( in fact, this is surely the worst and more simplified architetcure who have been provided by Nvidia,) 4 years ago, if you will have present this to the Nvidia PR, you will have been fired in the minute, but the 28nm story have change it ...
It is really a limited example, Nividia can do better of this, and i hope this will be on Pascal they will show it...

VeteranNewcomer

Nothing to do with size ( i mean on a physical level ) ... why it have not been done before? thats plenty illogic

ANd 32GB on FirePro .. 24GB and 16 on Nvidia Tesla ...

With your last phrase, You have just just conclude to what i was saying... memory capacity will have more impact than the peak bandwith ... Double the memory ( 4 to 8 ) have never got a big impact so far on gaming gpu''s, somewhat, double the access bit to 64, GDDR5x solve this problem.. hence why i speak aboout memory size instead of bandwith ..

with the same bus, with the same gpu's before having the 4 or 8GB was doing no difference, but now with GDDR5x, doubling the size of memory of a 4GB gpu will provide a difference. Even with a small bus ..

And the question remain, if it was so easy, why it have not been done before ? There's no physical constrain, and we are far of the problem who can been made by using something like TSV and interposer with HBM..

More seriously, im really impatient to test it on OpenCL raytracing for see the real impact. ( as memory bandwith, size, speed have there a real deep impact on performance )

Click to expand...

It is interesting to look at the performance trend of say the NVIDIA 980 compared to AMD 390/390x and how they perform going up the resolutions from 1080 to 4k.
That 256-bit bus (appreciate we were talking about the 32-bit/64-bit IO data path and is separate) does seem to cripple the 980 to some extent, unfortunately without a 384-bit bus directly comparable model it is not possible to realise just how much.
So bandwidth does seem to come into play (context also being testing utilising below 4GB VRAM so any issues pertaining to that is excluded).
And yeah I appreciate it is not an apples-to-apples comparison for current 256-bit bus to that of GDDR5X, although we still do not know just how much benefit we are going to see from it until it truly matures and also implemented and used in the real world with real games/applications.
Cheers

Newcomer

How dumb of me to think that you were talking about 1.2GHz for Maxwell.

Click to expand...

For assuming that the "magic number" was the max frequency, perhaps.

The only thing you've brought up so far is plain old process maturity (which should have benefited GCN just as much.)

What else do you have?

Click to expand...

How about you tell us about this "should" for GCN benefitting from process maturity first of all? You've already dismissed leakage and variability even though I linked a video showing AMD engineers talking about it and given links proving how GCN undervolts massively.

Is it really so hard to believe that AMD just had a torrid time on 28nm? That they abandoned the real progression due to 20nm cancellation?

Don't you think it's possible that Tonga is AMD's absolute best shot at 28nm, or is it more likely to be just another middling product designed as a test vehicle?

VeteranSubscriber

Maxwell is really not a good example for Nvidia.. ( in fact, this is surely the worst and more simplified architetcure who have been provided by Nvidia,) 4 years ago, if you will have present this to the Nvidia PR, you will have been fired in the minute, but the 28nm story have change it ...

Click to expand...

Other than 1/24 FP64 vs 1/32 FP64, both irrelevant, show me one, just one, regression going from Kepler to Maxwell. I honestly can't think of a single one.

I do agree with a more simplified architecture, but given that it gives much better results across the board, I don't see how that can be considered worse.

I very much hope (and expect) that Pascal will inherit most of Maxwell's features, adds a few missing/new ones.

Newcomer

It's good, but you can see that the increase in performance did come at an increase in area too. I was just pointing out that Nvidia did have additional costs for the extra performance, and many people don't seem to realise that was a factor in Maxwell's large improvement.

Thing is, in order to continue with that performance improvement you also need to continue with (relatively) larger dies. Try doing that on a new process and what happens?

Of course due to the maturity of the 28nm node, the bigger die and higher ASP was the smart option overall by far and that's why we'll see it again at 14/16.

VeteranSubscriber

It's good, but you can see that the increase in performance did come at an increase in area too. I was just pointing out that Nvidia did have additional costs for the extra perfomance, and many people don't seem to realise was a factor in Maxwell's large improvement.

Of course due to the maturity of the node, the bigger die and higher ASP was the smart option overall by far and that's why we'll see it again at 14/16.

Click to expand...

GTX 980 at 400mm2 is faster than a GTX 780 Ti at 530mm3. A GTX 960 at 225mm2 is only 10% slower than a GTX 770 at 300mm2. Where's the area increase here?

Nvidia added some area for the gm204 compared to a gk104, but that increase allowed them to shift into a higher performance class.

The only reason I didn't compare a gm204 to a gk110 to calculate area efficiency is because it would be unfair to gk110 with FP64.

Newcomer

GTX 980 at 400mm2 is faster than a GTX 780 Ti at 530mm3. A GTX 960 at 225mm2 is only 10% slower than a GTX 770 at 300mm2. Where's the area increase here?

Nvidia added some area for the gm204 compared to a gk104, but that increase allowed them to shift into a higher performance class.

The only reason I didn't compare a gm204 to a gk110 to calculate area efficiency is because it would be unfair to gk110 with FP64.

Click to expand...

But we need to factor in delta color compression as well. Kepler and AMD didn't have it at first, then AMD had it with Tonga but saddled and crippled it with a third unused memory bus LOL. Then Fiji with delta compression must have been overboard on bandwidth it just couldn't get close to saturating...

Maxwell is the perfect 1080p gaming GPU. Nvidia saw the market opportunity and grabbed it with both hands. I expect both companies to have mastered this on 14/16, thought I was slow to realise just how much of a (non)factor memory bus size has become.

Veteran

hmm you have to split out color compression and bandwidth to a separate label, when is it a bottleneck on when its not..... kinda hard to say at the moment without the context. But over all Maxwell is much more performance for mm, performance/watt etc. Technology is always dynamic and each generation of hardware gives you extra features that weren't available to the previous gen and you can't over look that.

Also never assume anything, we have seen both companies make their follies lol, its always a comparative between the two so looking at their previous gens they might not have F'ed up but comparing to each other, one might be the victor.

VeteranSubscriber

Factor it where? In comparing Maxwell to Kepler? I'm just trying to understand why Lanek is calling Maxwell Nvidia's worst architecture ever while I can't think of a single relevant thing where Kepler is better.

NVIDIA LightSpeed frame buffer compression technology has been developed and refined over the years on NVIDIA desktop and mobile GPUs, and is very effective in reducing memory bandwidth and decreasing power utilization by saving power-hungry, off-chip memory accesses.

Click to expand...

The fact that Maxwell's is called third generation, and this quote as well, suggests that even Fermi may have had some amount of color compression.

Veteran

Well, simple fill-rate tests don't suggest, that pre-Maxwell GPUs used delta-compression for general 3D rendering. Maybe they are referring to (delta)-color compression for MSAA or some other specific usage (2D graphics?).

LegendAlpha

Well, simple fill-rate tests don't suggest, that pre-Maxwell GPUs used delta-compression for general 3D rendering. Maybe they are referring to (delta)-color compression for MSAA or some other specific usage (2D graphics?).

Click to expand...

NV30 (and at some point, AMD) wouldn't be using delta compression for MSAA. The trick with MSAA is that it produces the same color multiple times, so the common value is stored once.
The Beyond3D compression test was introduced too recently to run it for Fermi, or Tonga for that matter.

About Us

Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!