On Thu, 10 May 2012 09:59:11 +0100, <einschlag at gmail.com> wrote:
> We have recently bought an iBuyPower gaming PC for our research group:
>
> AMD FX 8 core, 3.6 GHz, 16 GB RAM
>
> MathematicaMark8 Benchmark 0.86 is not bad, considering the price ~$800
> of this PC but I was expecting much more.
>
> Apparently Intel's MKL library used by Mathematica is not optimized for
> AMD processors.
>
> A test program calculating exponentials of large matrices takes 13 s on
> the AMD PC and only 8 s on my Mac Pro (Mathematica benchmark 0.7) that
> has 8 Intel Xeon cores at 2.4 GHz. And on my Lenovo laptop the program
> runs 9 s. I blame it on the MKL inadequacy for AMD.
>
> TestProgram := Module[{},
> NN = 1000;
> AMatr = Table[RandomReal[], {i, 1, NN}, {j, 1, NN}];
> NExec = 10;
> For[i = 1, i < NExec, i++,
> MatrixExp[AMatr];
> ];
> ]
>
> Execution by iBuyPower PC (AMD FX 8 core, Linux Ubuntu 64 bit)
>
> TestProgram // AbsoluteTiming
>
> {13.230105, Null}
>
> Execution by Mac Pro (Intel Xeon 2 x 4 core)
>
> TestProgram // AbsoluteTiming
>
> {8.126944, Null}
>
> Execution by Lenovo laptop (Intel i7-QM2060, Windows 7 64 bit)
>
> TestProgram // AbsoluteTiming
>
> {9.4275392, Null}
>
>
> On the other hand, a program compiling in C from Mathematica's help runs
> very fast on the AMD PC:
>
> TestProgram2 := Module[{},
> c = Compile[ {{x, _Real}, {n, _Integer}},
> Module[ {sum, inc}, sum = 1.0; inc = 1.0;
> Do[inc = inc*x/i; sum = sum + inc, {i, n}]; sum],
> CompilationTarget -> "C"];
> c[1.6, 10000000];
> ]
>
> Execution by iBuyPower PC (AMD FX 8 core, Linux Ubuntu 64 bit, GCC
> compiler)
>
> TestProgram2 // AbsoluteTiming
>
> {0.114427, Null}
>
> Execution by Mac Pro (Intel Xeon 2 x 4 core, GCC compiler)
>
> TestProgram2 // AbsoluteTiming
>
> {0.212875, Null}
>
> Execution by Lenovo laptop (Intel i7-QM2060, Windows 7 64 bit, Microsoft
> Visual C++)
>
> TestProgram2 // AbsoluteTiming
>
> {0.3540203, Null}
>
> It seems the second test program is not using MKL and thus AMD becomes
> very efficient.
>
> I will continue testing.
>
> Is there any way to improve Mathematica's performance on AMD machines?
>
> Dmitry
>
In the past, Intel had been known to engage in anticompetitive practices
with respect to AMD, and quite rightly was subject to legal penalties for
this. (Specifically, they encouraged large computer manufacturers such as
Dell to take up exclusive supply contracts by means of large discounts and
availability guarantees.) As a result of this judgment there has been a
lot of general hysteria that Intel may still be discriminating against AMD
performance-wise in their library and compiler products, which has
culminated in legal threats resulting in the large disclaimers posted all
over Intel's products stating that they are not meant for anything other
than Intel processors.
Suspicion and disclaimers are one thing, but actual performance is
another. As you may be aware, AMD offers their own math library, ACML.
What most people who level this criticism of MKL are not aware of,
however, is that MKL actually performs better than ACML, *even on AMD
processors*. So, even if it is not optimized as thoroughly as it might be
for AMD processors (which is more than likely the case; Intel does not
have an infinite development budget and there is no financial incentive
for them to go to great lengths optimizing for other manufacturers'
processors, which have performance characteristics very different to their
own), MKL is still better than the alternatives.
Now, how then to explain the poor performance you observe? Unfortunately,
the latest generation of AMD processors are simply not very good (the
Bulldozer processors are actually worse than the previous-generation
Phenom II processors in many applications), whereas Intel's products have
been making dramatic gains lately despite AMD's reduced competitiveness.
The end result is that a Bulldozer core is "worth" about half a Sandy
Bridge core, clock for clock, especially in floating-point workloads since
a single FP unit is shared between two of what AMD calls cores (indeed,
many have said that AMD's "8 core" processors are more correctly referred
to as genuinely having 4 cores due to much shared apparatus, but for
marketing reasons, AMD is obviously not buying that argument). In regard
to your results from TestProgram2: sorry to say, these are invalid because
the time taken to compile to C completely overwhelms the actual runtime,
and you include both in the assessment, as well as using AbsoluteTiming
which is not appropriate for single-threaded code with short runtimes
executing inside the Mathematica kernel. A more valid test is:
c = Compile[{{x, _Real}, {n, _Integer}},
Module[{sum, inc}, sum = 1.0; inc = 1.0;
Do[inc = inc*x/i; sum = sum + inc, {i, n}]; sum],
CompilationTarget -> "C"
];
Do[c[1.6, 10000000], {10}] // Timing
which on my computer (Intel Core 2, 3.2GHz) takes about 0.65 seconds, i.e.
65 ms for a single evaluation of c[1.6, 10000000].
Your matrix exponential test would also be better posed as:
NN = 1000;
mat = RandomReal[{0, 1}, {NN, NN}];
Do[MatrixExp[mat], {10}] // AbsoluteTiming
(I get 9.5 seconds.)
However I would be reluctant to draw any firm conclusion from these tests
if I were you. Far better to look at published benchmarks for real
applications, for instance:
http://techreport.com/articles.x/21813/15
or
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/7
which both show that Bulldozer performance is a very mixed bag in general.
While there are a few applications in which it can match or only just
outperform Intel's offerings, for the most part it falls behind them
considerably.
Best,
O. R.