If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Optimized Binaries Provide Great Benefits For Intel Haswell

Phoronix: Optimized Binaries Provide Great Benefits For Intel Haswell

Utilizing the core-avx2 CPU optimizations offered by the GCC 4.8 compiler can provide real benefits for the Intel Core i7 4770K processor and other new "Haswell" CPUs. For some computational workloads, the new Haswell instruction set extensions can offer tremendous speed-ups compared to what's offered by the previous-generation Ivy Bridge CPUs.

I would have found much more useful a comparison between the settings commonly used in binary packages (typically just up to SSE2 enabled on 64bit binaries), and a fewer set of them. Perhaps nocona, corei7-avx and core-avx2, and some -O2 vs -O3. The current benchmarks don't reflect anything to the real world, other than compiler capabilities using the new instructions, but you won't really find some -march=nocona binaries out in the wild. Perhaps just a default -march setting used in Fedora as an addition would have been nice.

SSSE3 and SSE4 might be helpful, although software that uses it will usually detect its presence. "-O3" "-flto" and profile guided optimisation usually yield the best increases, but they all have stability issues (and PGO needs user intervention in addition). LTO is probably the most stable of these in that you can compile and entire system and should only need to disable it for 10-20 packages out of 100s. If you're going to compile something I'd start with:

Code:

-march=native -O2 -pipe

Then add heavier options until something breaks whilst benchmarking its performance each time.

If you're not using 64bit then you should as GCC defaults to -mfpmath=sse which should yield some increases on modern hardware for floating point math (and 64bit might gain some additional increases as well). Some modern CPUs don't even have hardware support for x87 math so they'll be hampered even more without this option.

Interesting. Doesn't this really point into source based distros? I never thought they would make a big difference, but it feels like if you could recompile select bits of your Ubuntu machine, particularly with Haswell, you'd get much better performance with Haswell. But there is a lot of value in using pre-compiled packages.

I never understood why Ubuntu, with its focus on simplicity, hasn't offered an option, in the packet manager, to right click on a package and recompile it for your processor.

Interesting. Doesn't this really point into source based distros? I never thought they would make a big difference, but it feels like if you could recompile select bits of your Ubuntu machine, particularly with Haswell, you'd get much better performance with Haswell. But there is a lot of value in using pre-compiled packages.

I never understood why Ubuntu, with its focus on simplicity, hasn't offered an option, in the packet manager, to right click on a package and recompile it for your processor.

Cheers!

It does, but most packages won't be that affected. Things like scientific benchmarks, image processing, matrix multiplication, etc. see huge speedups. Your average app probably won't see any at all.

I wonder if a kernel built with custom CFLAGS would also effectively change performance in other tests.

You can squeeze a little bit of extra performance in some games by compiling Mesa with more aggressive CFLAGS (at least with R600g). Only a few % but still significant. The kernel is a bit more risky, though, and may not be stable if you go too aggressive.