GCC 4.6 Compiler Performance With AVX On Sandy Bridge

While we are still battling issues with the Intel Linux graphics
driver in getting that running properly with Intel's new Sandy Bridge CPUs (at
least Intel's Jesse Barnes is now able to reproduce the most serious problem we've
been facing, but we'll save the new graphics information for another article),
the CPU performance continues to be very compelling. Two weeks ago we published
the Intel Core i5 2500K Linux
benchmarks that showed just how well this quad-core CPU that costs a little
more than $200 USD is able to truly outperform previous generations of Intel hardware.
That was just with running the standard open-source benchmarks and other Linux
software, which has not been optimized for Intel's latest micro-architecture.
Version 4.6 of the GNU Compiler Collection (GCC) though is gearing up for release
and it will bring support for the AVX extensions. In this article, we are benchmarking
GCC 4.6 on a Sandy Bridge system to see what benefits there are to enabling the
Core i7 AVX optimizations.

The Advanced Vector Extensions, AVX, is the newest instruction
set architecture that was jointly agreed upon by Intel and AMD as the succeeding
technology to SSE4. Key points of the AVX ISA is expecting the vector data width
to 256-bits, a new SIMD instruction format, and new data manipulation and arithmetic
compute primitives. Simply put, AVX is meant to be another step forward for increasing
the processor's performance and efficiency. AVX has been talked about for years
but with Intel's Sandy Bridge CPUs the Advanced Vector Extensions support is finally
in place. AMD will launch their first AVX-capable CPUs later in the calendar year.

Fortunately, going back to at least early 2009, Intel has been
working on AVX support. In April of 2009 the main bits of Intel AVX support
landed into the mainline Linux 2.6.30 support. This kernel-level AVX work was
for enabling YMM state management for the 256-bit vector processing. In order
to run an Intel Core i5/i7 Sandy Bridge CPU under Linux with one of the new Intel
chipsets you need to be using a H2'2010 Linux distribution (circa mainline Linux
2.6.35), so regardless there is AVX support in place at the kernel level for you
if running Linux. From this regard, the AVX support is actually in a better position
than on Windows, which requires using Microsoft Windows 7 with the brand new Service
Pack 1.

When it comes to the compiler support for AVX, the GNU Compiler
Collection developers have been working on that for some time as well. There are
traces of Advanced Vectors Extension support in this leading open-source compiler
going back to GCC 4.4. However, it was not until this past December in the run-up
to GCC 4.6 that there were mtune/march/with-cpu options available designed for
AVX and Intel's newest CPUs. In early December an Intel engineer submitted
the patches for Core i7 AVX CPUs. The option is named "corei7-avx" and
is designed for use with Core i7 CPUs that carry the AVX support. The GCC documentation
describes the corei7-avx option as for "Intel Core i7 CPU with 64-bit extensions,
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set
support." As mentioned
recently, this will all appear in GCC 4.6.0 when released in the coming months.
GCC 4.6 also can be built with the --with-fpmath=avx flag, which will allow the
GNU compiler to use AVX floating-point arithmetic.

On the LLVM side, the Low-Level Virtual Machine only appears to
have partial support for the Advanced Vector Extensions and lacks Core i7 "Sandy
Bridge" tuning, but the LLVM/Clang benchmarks under Sandy Bridge will be
saved for another article. In this article, we are looking at the AVX performance
under GCC on Intel's newest CPUs. To do this comparison we first built GCC 4.3.5,
GCC 4.4.5, GCC 4.5.2 and GCC 4.6.0 from source without specifying any tuning options
during the build process or when building out our library of benchmarks. This
was then followed by building out our test library (after self-hosting the same
version of GCC with the same test options) with the core2, corei7, and corei7-avx
options. Lastly, we tested GCC 4.6.0 when built with the AVX floating-point math
support and using the corei7-avx flags. The GCC 4.6.0 build we were using was
the 2010-01-29 snapshot. We did the GCC 4.3/4.4 testing to see how this open-source
compiler would react when running on this more-modern CPU. The other argument
used when building the GCC releases were --enable-lto for enabling the link-time
optimization but besides that it was a stock build.

The core2 GCC option is designed for CPUs with just MMX, SSE,
SSE2, and SSE3 instruction set support. The corei7 vanilla option adds in SSE4.1
and SSE4.2 AVX support to the mix while the corei7-avx option obviously adds in
the AVX instruction set support plus AES and PCLMUL ISA support.

It is not the Core i5 2500K setup we are using this time around
but rather a Sandy Bridge notebook we just received from System76. It's a very
nice, but expensive (circa $2500 USD) System76 Serval Professional Notebook that
has an Intel Core i7 2720QM CPU, 8GB of system memory, an 80GB Intel SSDSA2M080
solid-state drive, and NVIDIA GeForce GTX 485M 2GB graphics. It ships with the
Ubuntu 10.10 x86_64 release and besides testing out the different compilers, we
also upgraded its kernel against the mainline Linux 2.6.38-rc2 release. The Intel
Core i7 2720QM is a quad-core part with Hyper Threading that is clocked at 2.2GHz
but with a Turbo Boost Frequency of 3.3GHz and there is 6MB of L3 cache.