If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

GCC vs. LLVM/Clang On The AMD Richland APU

07-06-2013, 12:50 PM

Phoronix: GCC vs. LLVM/Clang On The AMD Richland APU

Along with benchmarking the AMD A10-6800K "Richland" APU on Linux and its Radeon HD 8670D graphics, I provided some GCC compiler tuning benchmarks for this AMD APU with Piledriver cores. The latest Linux testing from the A10-6800K is a comparison of GCC 4.8.1 to LLVM/Clang 3.3 on this latest-generation AMD low-power system.

I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.

Comment

I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.

* LLVM is great at Successive Jacobi Relaxation.
* GCC is great at C-Ray.
* LLVM has no OpenMP support, so don’t even try to use it for scientific code, except if you want to go all the way and use explicit MPI (which makes the SciMark test somewhat less useful).

Comment

Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

As such the Botan benchmarks are pointless in this context.

This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.

Comment

Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

As such the Botan benchmarks are pointless in this context.

This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.

-O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.

Comment

-O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.

Yes, sometimes -O2 actually beats -O3, but that is because the optimizer sometimes fails in it's job of accurately weighing things like increased cache use against the improved performance of a larger code segment (through inlining, unrolling etc), also -O3 is not specifically indended for 'smaller segments of code', the compiler heuristics typically does a good job of deciding which code benefits from unrolling and inlining, and which codepath's are hot and cold, just because an optimization is enabled it doesn't mean that it will end up used on all segments of code, so yes, you can use -O3 on entire applications just fine, and most cpu intense ones default to -O3 in their configurations.

Of course if you want to give the compiler the best help, you can always use profile guided optimization where you let the compiler gather runtime data which it can then use to better optimize the code.

But despite the fact that -O2 beats -O3 due to failed compiler heuristics, if you only test ONE optimization level then of course it must be -O3, again there is no 'standard' on compiler optimizations enabled per 'level' between compilers. The ONLY standard is that -O3 is supposed to generate the _fastest_ code.

So unless you know beforehand that -O2 in a particular test generates the fastest code for BOTH compilers on a particular benchmark, using -O2 means nothing in a benchmark where you want to see which compiler generates the _fastest_ code, as that is what -O3 is supposed to do and also does in the vast majority of cases.

Actually that’s what I’m talking about: The tests are useless, because their result is useless. If you need OpenMP, you don’t need to look at the results. The compiler is not for you. And if you don’t need OpenMP you don’t need the results either: They have no meaning for you.