AMD Reveals Fusion CPU+GPU, To Challenge Intel in Laptops

“The ‘Llano’ processor that AMD described today in an ISSCC session is not a CPU, and it’s not a GPU – instead, it’s a hybrid design that the chipmaker is calling an ‘application processor unit’, or APU. Whatever you call it, it could well give Intel a run for its money in the laptop market, by combining a full DX11-compatible GPU with four out-of-order CPU cores on a single, 32nm processor die.”

Although, I thought the embedded ATI GPUs used completely different drivers…

Most of the embedded ones actually work just fine with the desktop drivers if you just use modified .inf files. Like f.ex. on my laptop the original driver was years and years out-of-date so I went ahead and downloaded the slightly modified Omega driver. Worked like a dream.

The manual mentions it is a Z430, AKA Yamato DX. However, finding any real info, like its lineage, seems to be nearly impossible (AMD wants to tell you about content, show people smiling with mobile phones in front of their faces, and get you to be, or work with, an ISV).

In a nutshell, AMD has taken the “STARS” core that’s used in their current 45nm offerings, shrunk it to a new 32nm SOI high-K process, and added new power gating and dynamic power optimization capabilities to it.

YES.

Seriously, just tacking a GPU on there won’t be a night and day thing, and the full fruits of Fusion will take a few years, just on the hardware side, to get where they want them.

But, the new process, and updated power management, will be a godsend, as they will be competing against Intel’s newer Core and Atom chips (moreso Core, initially, I imagine).

The Fusion approach has, in general, been marketed as a low-power mobile desktop solution. Basic gaming graphics on the cheap, competing with other integrated graphics chips. But I think the real advantage should and will come in high-performance applications.

Consider the push to using GPUs as general processing units that leverage the massive parallelism to get certain tasks done crazy fast. Now consider how much quicker and easier it will be off-load such computations to a GPU core when said core is *on die* with the CPU that wishes to offload its data-parallel task…

And, think of how much easier it will be to get the OK to program for it, given that you won’t have to buy extra hardware. It will just be right there in your Opteron.

…and, being right there, while it will eat into your RAM bandwidth, it will allow you to have the CPU (scalar) and GPU (vector) components working on small sets of data, like a few hundred KB here and there, which is something that is somewhat difficult now, as the performance benefit could easily get eaten up by moving data between devices (assume there are conditional branches, here). Worst case, you’ll get the same performance improvement as if you had paired the CPU with an off-chip Radeon IGP, but without having to worry about what motherboard model you have to work with.

Altivec, and similar dedicated vector setups, have been around for awhile, but the GPGPU model offers much more flexibility, and there’s middleware being made and added to all over the place, right now.

How is the GPGPU programming model that more flexible than the SIMD altivec et al approaches? The baroque memory hierarchies and the instruction bundling/scheduling and memory access constraints found in the programmable GPUs lead me to believe that the complete opposite is true.

Also what I haven’t seen anyone address here is the fact that in the windows ecosystem, as far as Microsoft has always been concerned. They have never even consider non x86 coprocessors to even be in the same address space. Putting the GPU so tightly coupled with the processor, I assume that AMD must have worked with Microsoft to address such issues. Because the programming models for GPUs thus far, be it CUDA, OpenCL or DirectCompute, under the assumption that co-processors are not only decoupled by in different address spaces. With processes displaying elevated computational densities and very coarse communication/data flows in order to make those assumptions work.

How is the GPGPU programming model that more flexible than the SIMD altivec et al approaches? The baroque memory hierarchies and the instruction bundling/scheduling and memory access constraints found in the programmable GPUs lead me to believe that the complete opposite is true.

Altivec adds vector math operations to a typical CPU only. How is that not less flexible and powerful than a processor capable of running its own entire programs, doing more than vector maths, with many threads?

SIMD add-ons to CPUs have yet to give anywhere near the kind of added performance that, say, CUDA on Geforces has been giving. Not only raw performance, but performance per watt. The computational power has been shown off again and again, including doing things like password cracking and general pattern matching.

FI, check out this: http://www.golubev.com/rargpu.htm. Can a regular SIMD unit (or even a nicer one, like Altivec) added to your CPU make it do that 10+ times faster? Most of the time, it won’t be able to do anything for such workloads.

“Baroque” memory architectures and all that don’t matter. Actual computational power is being brought the table, without undo amounts of dev hours being required to harness it. As such, any and all clunkiness in the designs (which, yes, is there in spades) is moot, as the software and programmers are clearly able to overcome them.

So, no, I haven’t. It takes hardware that I can’t afford, right now (it will be a major consideration, when I get around to upgrading video cards). However, that’s not at all what I was addressing.

They are flexible in that they can be applied to many different kinds of processing, as long as it can operate on a regular packet of some size, preferably nicely packed in-order in memory.

When the question is what can they do, the details of what makes them easy or hard to program for simply don’t matter. If those details make it such that programs aren’t being made to harness the hardware’s power, or are being made but largely failing to perform, then the lack of performance will be evident. However, it is the other way around. Performance is proving excellent, and all kinds of oddball software is being written all over the place. meanwhile, AMD has been talking up tight integration of the hardware, including dealing with memory addressing, for some time now–just being intentionally vague on implementation details (SOP).

Please, refrain from moving the goal pasts or redefining terms in order to fit your narrative. If you don’t know about something or have limited experience/education in the matter, take a time to at least understand the terminology properly. Or at least do not attempt at using marketing spiel to “educate” people with PhDs in the field.

You are misunderstanding “computational power” with “flexibility” which are two very different things. The SIMD programming model for Altivec and/or MMX/SSE is far more flexible and relatively simple than the one used to program GPUs (even though at a basic level it is also a SIMD approach).

For starters, one has to deal with multiple address spaces in GPGPU vs. the unified memory model by the CPU, also there are issues of the granularity of the computational density/dataset size (coarse for the GPU vs fined-grained for the CPU-SIMD extensions). Issues also arise regarding data access ordering and thread scheduling required by GPUs to execute code efficiently vs. the simplicity that the CPU presents to the programmer/compiler since the memory hierarchies and out-of-order scheduling present in most modern CPUs tend to take care of a lot of those headaches, etc, etc. Then there are the issues of portability between GPU families, since the programmer sometimes has to deal with tradeoffs in device abstraction (higher portability, lower performance) and device-specific scheduling and dataset ordering (low/nil portability, higher performance)… which are thus far some of the greatest limiters for actual GPU killer apps gaining traction in mass markets.

For specialized code and apps, the GPUs are rather optimal devices when compared against a traditional scalar CPU from a flop/watt perspective. However the fact that those are specialized areas of application should have given you a hint that at this point GPUs and their programming model are anything but flexible when compared against a normal CPU.