If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

AMD FX-8150 With The Open64 5.0 Compiler

11-25-2011, 02:10 AM

Phoronix: AMD FX-8150 With The Open64 5.0 Compiler

The Open64 5.0 compiler was released earlier this month with many changes, among the prominently noted items were greater optimizations for AMD's Bulldozer CPUs. In this article is a first-look at the Open64 5.0 compiler performance compared to its earlier release, as tested on an AMD FX-8150 eight-core "Bulldozer" processor.

Comment

A lot of the whining against Bulldozer has been about it's lack of floating point performance from Windows users. Of course, with technologies like OpenCL, it's far better to be doing floating point math on GPUs than CPUs since they're over 1000x faster at it. A lot of game companies still do WAYYY too much floating point work on the CPU. Games such as Bad Company 2 and the like do all their physics calculations on the CPU when they really should be done on the GPU, since the GPUs are designed for those types of massively parallel floating point calculations and CPUs really aren't.

Thankfully there are physics engines out such as Havok that run under OpenCL, which means game companies don't have any excuse to continue using the CPU for so much floating point work. Which I think makes these Bulldozer chips a good choice long-term since they can beat Intel's more expensive chips in integer performance. Though of course, the open source linux drivers are a long way away from supporting OpenCL, though not many serious gamers (Crysis 3, Battlefield 3, etc) run those drivers anyway. OpenCL is going to become much more important in the future as AMD is shifting the focus of floating point away from their CPU cores and towards their APU / GPU cores which run OpenCL for floating point work..

Comment

Would be interesting to see if the new compiler improves for Intel CPUs in that benchmark too.

That's exactly the problem with almost all benchmarking sites, Phoronix not withstanding.

Tom's hardware concluded that 6 core Sandy Bridge-E was 30% faster than the FX-8150. How much of that came down to compiler optimizations, especially since a rather suspicious number of benchmarks are compiled with Intel's own ICC compiler? There is no mainstream compiler that is AMD-biased to balance out the results. Of the 30 benchmarks, the average user will use between 0 and 3 of those applications in real life, but yet they will be "recommended" to buy the Intel CPU based on a useless aggregate score that is distorted by synthetic benchmarks like Futuremark which always favor Intel by an unrealistic amount compared to real life.

30% isn't that big of a difference in real life anyways(assuming it's even really 30%), especially since most CPUs are in idle/power-saving mode most of their lifetime. If AMD would market Bulldozer as a quad-core with superior hyperthreading, then it suddenly becomes the world's fastest consumer-grade CPU, since that 6 core SB-E CPU requires 30% more die size, 2 more cores, and costs 4x as much to acheive only 30% more performance.

Comment

Offloading FP to GPU should mean lower prices than current CPUs since it means the CPU was designed to do less than current CPUs.

One thing not clear from traditional benchmarks is what are the capabilities of the CPUs and then test those capabilities. That way you get an idea of what you are paying for rather than what the quality of software X is. Software X may be written by a crappy programmer.

Has BD 8150 been compared to previous Phenom IIs? That way we would have an idea of performance compared to past CPUs and if the price is justified.

Comment

Yup, and unless you are doing something like cryptography you are better off with an X6 at this point.

The comparisons that have been done against the Phenom IIs have used applications compiled without bulldozer optimizations and under an OS with a thread scheduler that doesn't understand the modular design (2 cores per module with some shared resources) of Bulldozer...

Comment

Would be interesting to see if the new compiler improves for Intel CPUs in that benchmark too.

If you compile those applications with the Bulldozer optimizations, the compiled binaries don't run on Intel CPUs.. So I don't see how such a comparison could be made. I think there is a way to compile a binary so that it only enables the Bulldozer optimizations if you have a Bulldozer CPU, but I'm not sure if Open64 does this and what options need to be set to do it.. But even in that situation, it would mean the Intel chips don't get *ANY* of the Bulldozer optimizations anyway. So I'd say it's a pretty safe bet that all the Bulldozer optimizations are only applicable to Bulldozer CPUs and would not help Intel CPUs at all since the Intel CPUs currently don't even support FMA3 (coming 2013 for Intel), let alone FMA4.

Keep in mind, Bulldozer runs FMA4 while future Intel CPUs will run FMA3.. They're mutually exclusive though I'm sure there are some tricks in there to get a binary to run FMA4 on Bulldozer CPUs and FMA3 on Intel CPUs (different compiled paths).. Certainly a lot of the performance boosts in this new Open64 compiler revolve around using FMA4. It's the only compiler out there besides GCC that has FMA4 accelerations on the drawing board.

Comment

Yup, and unless you are doing something like cryptography you are better off with an X6 at this point.

Wow, what a sweeping generalization... and a very misleading one at that.

How about something like:

"Unless you're building a PC to run the Cinnebench single threaded benchmark, you're better off with a Core2 Duo."

Bulldozer did have some regressions, mostly in single threaded benchmarks. However, it's also faster than the Phenom II X6 in many single threaded benchmarks, and almost universally faster in well threaded benchmarks.

I'm posting from an FX8120, and it feels faster than any Sandy Bridge, Nehalem or Phenom II I've ever used. I have the following windows open:

, and not ever a hint of lag, despite running 2 craptastic Java-based IDEs at the same time. I can even do something CPU intensive like creating a Truecrypt volume or compiling the Linux kernel, and still no slowdown whatsoever. I hate to break it to you, but a quad core Sandy Bridge cannot do all of those things and still be perfectly responsive, especially if you're using it's IGP.