New Article: Compute Efficiency 2012

forestlaughing (forestlaughing.delete@this.yahoo.com) on July 25, 2012 8:51 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on July 25, 2012 1:37 am
> wrote:
> > New computational efficiency data shows GPUs with a clear edge over
> CPUs, but
> > the gap is narrowing as CPUs adopt wide vectors (e.g. AVX).
> Surprisingly, a
> > throughput CPU is the most energy efficient processor,
> offering hope for future
> > architectures. Our data also shows some
> advantages of AMD's Bulldozer, and the
> > overhead associated with highly
> scalable server CPUs.
> >
> > Comments and feedback
> > welcome!
> >
>
> > David
>
> So what's the point of looking at peak DP floating-point? That's
> not really a good measure, well, of anything. One can pack a fpga full of vector
> pipes, and get way more peak throughput than these processors, if you don't need
> reasonable memory bandwidth and control structures to go with the ALU. Granted,
> the fact that all of these are successful commercial products, indicates that
> they have all maintained some reasonable ratio of compute and their other
> functionality.
>
> Are we assessing these processors based on their
> applicability to the HPC marketplace? Not that linpack is a good benchmark, in
> fact it's a pretty terrible benchmark, but even that is better than simply
> looking at peak throughput. If we assess these processors for other
> applications, then why look at Floating-point. Almost nothing else is FPU bound.
>
>
> I guess what I'm saying is, these graphs don't really show anything of
> interest. Do they?
>

I think they do, if for nothing else than that they directly compare two approaches to the same problem, i.e. extending the CPU for vector processing or doing the same calculations in a co-processor (and this will be a major point of conflict in the HPC and gaming businesses to be decided within the next 10 years). Real-world implications to efficiency arising from I/O bandwidth, latency, branching, caching etc. are of course always relevant, but should be looked at as different problems appearing further down the pipeline in my opinion (particularly since they depend on implementation and thus on factors such as process technology, company R&D and engineering resources and skills and not just on the architectural features).

So for making a purely theoretical analysis of the base architectures and the viability of their high-level design choices, it could actually be an advantage to overlook other factors than peak DP performance. And the charts show, which I find very interesting (although not completely surprising), that the gap between CPUs and GPUs has actually narrowed since 2009, not widened (although David Kanter is of course correct in that the gap will widen again with Nvidia and AMD going to 28nm and Intel MIC going to 22nm). Any other way of looking at it would miss the fact that not only has the real-world performance gap between CPUs and GPUs shrunk (which could easily be attributed to just the difficulties of GPGPU programming models etc.), but the advantage of GPUs even in raw, peak performance has actually decreased, contrary to everything that has been predicted and promised by the GPGPU camp.

And the next step going from that is of course that if CPUs through vector extensions can deliver peak performance within a certain fraction of that of GPUs, their major advantages in thread control and OoOE will outweigh the advantage in peak performance of the GPUs. But for drawing that conclusion, I think it makes sense to first look at peak performance separately, and then take other factors which affect performance into account.