This
page presents data for the STARS Euler3d CFD benchmark. The benchmark is
intended to provide information about the relative speed of different
processor, operating system, and compiler combinations for a
multi-threaded, floating point, computationally intensive CFD code.

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, a taper ratio of 0.66, and a 45 degree quarter-chord sweep angle. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes. Figure 1 shows the CFD predicted Mach contours for a freestream Mach number of 0.960.

The euler3d benchmark source code is not publicly available. However,
the benchmark code is derived exactly from the production CFD code with hard coded
I/O and calculation routines. We use the Intel Fortran compiler (ifort 10.0). All floating point variables are Fortran's double precision (8 bytes). Parallelization is through OpenMP. Tim Cowan's dissertation provides an in-depth development and technical discussion of his euler3d CFD code. For comparison, source code for a 1D finite element CFD solver preceding the euler3d code is available .

The Euler3d benchmark is sensitive to chipset, memory, and cpu performance. For example, a E6700 Core 2 Duo on a certain Nforce4 motherboard gives a benchmark score of about 1.08. A reference Intel motherboard with the same chip gives a score of about 1.38. For our lab, that's a major difference because our simulations can run for a week, a month, or more!

A fundamental characteristic of unstructured grid computations is the requirement to scatter calculated variables across a non-continuous list of memory locations. Thus, we should conceptually expect the benchmark to be rate-limited by raw floating point calculation speed and the memory access speed. As these two are (mostly) sequential operations, both influence the final benchmark speed.

Intel's Fortran Compiler generates different code paths for different processor capabilities. Non-Intel processors may not trigger the fast code path (See more information). Yes, we are aware of this issue. Yes, it is quite annoying that Intel's compilers behave this way. According to an anonymous contributor, the difference on a modern AMD processor is about 10%. We actually expected the difference to be significantly greater! We are also aware of several fixes but have not applied them as of yet. Raw performance is king in the CFD world. That usually means using Intel's chips and Intel's compiler.
For what it is worth, we don't consider Intel's code path scheduler a bug, the behavior is intentional, well-publicized, and likely a purely economics decision by Intel. We firmly believe in letting the market decide if Intel's strategy is worth the lost goodwill. Expecting or desiring non-voluntary fiat to enforce an arbitrary standard is nonsense (e.g. the FTC's "decision" regarding this issue) and anti-liberty. This issue is just one (of many) criterion we use when periodically selecting our lab's compiler. There are strong competitors to Intel's compilers; some are tantalizingly enticing with some benchmarks.

Downloads

If you have an x86
compatible PC and want to see how your machine compares to those on this
list, download the files provided below to run the benchmark testcase. The
benchmark executable is a hard-wired version of STARS Euler3d that will solve one and only one problem. You'll also need the bm2.g3d file, which is a large binary data file defining the agard2 geometry. Since this is a hard
coded benchmark, visualization and aerodynamics output files are not
generated.

Benchmark data, in an xml format, is available to view the raw numbers. Also, please send us an E-mail to report your benchmark result. We appreciate all submissions!

From a command prompt window (cmd.exe), the command line format is: e3dbm . For quick
experiments, we typically use between 1 to 5 steps (#steps). The default
is 20 steps. The program will automatically detect and use the total number of
processors in your system for the default number of threads (#threads). When possible on your N processor machine, send us N results when using 1 through N threads.

The benchmark outputs time (seconds) and score
(Hz) information in a command-line window.

Benchmark Links

If you want to find more benchmark data regarding the performance of various processors, we recommend that you check-out the SPEC benchmark suite and related performance data at http://www.spec.org/. Also, TechReport provides reports and analysis for a wide range of up-to-date PC hardware.