Engineers boost AMD CPU performance by 20% without overclocking

This site may earn affiliate commissions from the links on this page. Terms of use.

Engineers at North Carolina State University have used a novel technique to boost the performance of an AMD Fusion APU by more than 20%. This speed-up was achieved purely through software and using commercial (probably Llano) silicon. No overclocking was used.

In an AMD APU there is both a CPU and GPU, both on the same piece of silicon. In conventional applications — in a Llano-powered laptop, for example — the CPU and GPU hardly talk to each other; the CPU does its thing, and the GPU pushes polygons. What the researchers have done is to marry the CPU and GPU together to take advantage of each core’s strengths.

To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode unit, and the GPU becomes the primary computation unit. This works out well because CPUs are generally very strong at fetching data from memory, and GPUs are essentially just monstrous floating point units. In practice, this means the CPU is focused on working out what data the GPU needs (pre-fetching), the GPU’s pipes stay full, and a 20% performance boost arises.

Now, unfortunately we don’t have the exact details of how the North Carolina researchers achieved this speed-up. We know it’s in software, but that’s about it. The team probably wrote a very specific piece of code (or a compiler) that uses the AMD APU in this way. The press release doesn’t say “Windows ran 20% faster” or “Crysis 2 ran 20% faster,” which suggests we’re probably looking at a synthetic, hand-coded benchmark. We will know more when the team presents its research on February 27 at the International Symposium on High Performance Computer Architecture.

For what it’s worth, this kind of CPU/GPU integration is exactly what AMD is angling for with its Heterogeneous System Architecture (formerly known as Fusion System Architecture). AMD has a huge advantage over Intel when it comes to GPUs, but that means nothing if the software chain (compilers, libraries, developers) isn’t in place. The good news is that Intel doesn’t have anything even remotely close to AMD’s APU coming down the pipeline, which means AMD has a few years to see where this HSA path leads.

Updated @ 17:54: The co-author of the paper, Huiyang Zhou, was kind enough to send us the research paper. It seems production silicon wasn’t actually used; instead, the software tweaks were carried out a simulated future AMD APU with shared L3 cache (probably Trinity). It’s also worth noting that AMD sponsored and co-authored this paper.

Updated @ 04:11 Some further clarification: Basically, the research paper is a bit cryptic. It seems the engineers wrote some real code, but executed it on a simulated AMD CPU with L3 cache (i.e. probably Trinity). It does seem like their working is correct. In other words, this is still a good example of the speed-ups that heterogeneous systems will bring… in a year or two.

LOL, when these Engineers at North Carolina State University or anywhere else in the world for that matter, actually git pull a current x264 http://git.videolan.org/?p=x264.git;a=summary , and actually write a few suitable AMD APU GPU algorithm’s and GPU patches to make a generic 1080P x264 high profile level5.1 encode run 20% faster than a generic Intel i7,AND submit them to the x264 dev channel to pass review inspection in inclusion into master then ill be impressed.

nice. its a step in the right direction, and at the least it gives us a preview of what they might be able to do.

I don’t get why people are bashing this, Its a new strategy for the company and these people make it sound like they have had APUs for the last 10 years. I’m under the assumption that good reserch takes time, and progress in this regard is slow… apparently some here have had success in pumping something perfect out over night, that or there clueless about how development of anything works…

AMD has already been funding them (there is an article in Ars Technica about that).

Kyle Mooney

GPU acceleration has been going on for almost a decade so I fail to see what is special about this. I’ve seen algorithms get over 100% speedup on CUDA cpu-gpu hybrid codes. It seems that the only difference here is that they’re using an apu, and the speedup is substandard when put up against even poor general purpose gpu ports.

The simulator is fine to see if silicon can work. AMD constantly releases optimistic simulator data about product refreshes only to have the reality of Microsoft bugger it all up.
When is the computing world going to realise that WINTEL has strategically written Windows to run faster on Intel cpu’s and slower on AMD cpu’s?

In fact Microsoft’s recent patches that improve performance on Bulldozer just confirms this.
Dump the simulator and start using real silicon. Then we’ll see if 20% performance boost is possible.

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.