With AMD's release of the Open64 compiler for Linux, I thought I would try my hand at recompiling the enigma application. I have always been interested in speeding up these applications because I have felt that a 64-bit application should perform the same function faster than a 32-bit application, especially one so mathematically driven like enigma. While I have compiled my own application for my AMD Phenom in the past, the performance increase in 64-bit Linux was only a few seconds of improvement vs. TJM's 32-bit applications he posted on the site last year. This has changed dramatically with the release of AMD's recent release of Open64 v4.2.2. I have seen a significant decrease in computation time for these new 64-bit applications compared to the 32-bit optimized ones released last year. Here is a list of times I have noticed with the benchmark application evaluation program.

Now for the good part. The 64-bit Linux optimized applications REALLY PUT THE HAMMER DOWN! I do not know what voodoo the engineers at AMD exercised when they optimized this compiler but there is about a 20% performance increase compared to TJM's P3 optimized app. Here's the numbers....

So the AMD optimizations give roughly the same performance across the board of about 24% compared to TJM's P3 app. 35% compared to the default app. For more info, I also ran some Intel optimizations...

...if you would like to use one of them (32-bit or 64-bit, AMD or Intel optimized), be my guest. I would like someone to compile a Wolfdale 64-bit app in 64-bit mode so I can see if an Open64 64-bit Wolfdale run on an Intel Core 2 Duo or Quad would post even quicker times than the one's I've shown here.

Conclusions:

1.)The 32-bit Open64 apps are just as fast, but not faster than, TJM's Intel P3 optimized app from last year.
2.)The 64-bit Open64 application I compiled are significantly faster any 32-bit application, either gcc or Open64 compiled. (And available for anyone to use, just click on the link above to download. I have included both 32-bit and 64-bit apps just in-case someone else would like to test them.)
3.)Windows users may not be left out here. The gcc 4.4 version supposedly includes more optimizations for AMD processors (as shown with my gcc 4.4 result above). I would think if someone was running WinXP 64-bit, or Vista 64-bit, or Win7 64-bit could download gcc 4.4 64-bit for their version of windows and compile these apps to find out if they could get any improvement compared to TJM's P3 app. Also, Open64 I would imagine may be ported to Windows some day (I can't see AMD letting this work only on Linux.)

Thanks to TJM for letting the code out so us unemployed mechanical engineers can have something meaningful and rewarding to do besides changing dirty diapers during the day....;-)

If you have any questions, best way to get a hold of me is by email at mdoerner1 (at) cox (dot) net

Just to eliminate confusion, you must use the app_test_522.tgz file from TJM's Optimized App Thread along with the executables shown above. You must still follow TJM's Optimized app procedure as shown in this thread....

Here is the performance boost I'm seeing (so far) on my Phenom. The only minor tweak I've made on my system is I've gone from 3.05GHz to 3.10GHz, but this only shaves a few seconds of the small tasks, maybe a minute or 2 on the large tasks.I've gone from 1775 RAC to 2330 RAC, maybe a bit more. We'll see here shortly.

I've tried to build Wolfdale app, but the compiler says that the target processor does not support SSE2. That's quite weird, because I've also tried on a machine with Wolfdale processor and got the same result. Buninek from BOINC@Poland also tried with no luck, but here is his other app (xeon x64):

So far the fastest app runs at speed similar to one of my older 64 bit executable built with Intel compiler, which was slightly slower than old PIII app. Tested on Q6600, E7200 and E5200.
Now I'm trying to build something faster for Athlons 64/64 x2, it seems unfair that these processors run enigma slower than Pentium III with half of their clocks (996MHz PIII runs faster than Athlon 64 2,4GHz).M4 Project homepageM4 Project wiki

I can let the guys at Open64 know that -march=wolfdale chokes on Intel and AMD processors, giving the same bogus "No SSE2" error.

So the 32-bit gcc compiled P3 app runs faster than either Open64 or Intel Compiled 64-bit code? That's weird, because when I ran and tested the P3 gcc 32-bit app in the benchmark it was slower than the Open64 64-bit code. Maybe we're finally running into an processor architecture situation here?

Just so I understand, are you saying a PIII at 1.0Ghz beats a 2.4GHz Athlon64 in raw time?!?!?! Or are you saying clock-for-clock the PIII is more efficient? (i.e. let's say the P3 completes a task 1000 seconds, are you saying the Athlon64 completes it in 2400 seconds, 1200 secs, or 990 secs?)

I'm glad to hear the Intel Compiler code is as efficient as the Open64 code. If the Gentoo guys make a distro with Open64 instead of GCC, I just may have to switch distros from OpenSuSE to Gentoo..... :)

Mike Doerner

PS I don't want to open another can of worms here, but I've heard the C code isn't as efficient mathematically as Fortran (which is why it won't die). Not that a code re-write is possible this late in the game, but that architecture optimization will only get us so far.....

Just so I understand, are you saying a PIII at 1.0Ghz beats a 2.4GHz Athlon64 in raw time?!?!?!

Yep, thats exactly what I'm saying. 1GHz PIII Coppermine beats 2.4GHz Athlon64 by around 3-4 minutes on shortest workunits.
But...
I think that's the only situation where app build with Intel C Compiler is much faster than anything else. I forgot exact numbers and I don't have PIII box here anymore, but the speedup gained by replacing PIII-gcc app by PIII-Intel was just insane - more than twice the speed of the PIII-gcc app and around 3 times faster than speed of base app.

If I remember correctly, 1GHz PIII needed around 40 minutes to complete hceyz72/0, and the fastest result I've seen from Athlon 64/2.4GHz is around 44 minutes.

I tried running that app on my box and it choked.....

mdoerner@Linux-QuadZilla:~/Xfers/enigma_benchmark> ./start
./start: line 4: ./enigma: No such file or directory

AMD has released Open64 4.2.2.1, so I was able to compile a 64-bit Wolfdale app for the Intel guys. The 32-bit Wolfdale app will have to wait a bit, as Open64 4.2.2.1 has introduced a bug with the -m32 flag...:-(. As soon as they get it fixed I'll add it to the archive.

AMD has released Open64 4.2.2.1, so I was able to compile a 64-bit Wolfdale app for the Intel guys. The 32-bit Wolfdale app will have to wait a bit, as Open64 4.2.2.1 has introduced a bug with the -m32 flag...:-(. As soon as they get it fixed I'll add it to the archive.

Mike D

OK, the 32-bit wolfdale app has been added. For some reason, when compiling with -m32 flag, you gotta use gcc 4.3.3. AMD says the preferred version is gcc 4.1.2 or 4.2, but that only works with the -m64 flag. Oh well.

Also, I have added 2 apps (32-bit and 64-bit) to the file, with -march=anyx86 enabled. I'm gonna try it on an old, old, old 233Mhz Pentium MMX portable I've had sitting around doing nothing for awhile. I put DSL Linux on it (a debian derivative) with a 2.4.31 version of the kernel. Unfortunately, it picked up a hceyz72_1_ task and it's crunching on the default app right now, so it might not be until tomorrow before it's done crunching on it....:-( How computers have come along in a decade.....

Odd thing is anyx86 and barcelona flags seem to compute in the same time on my Phenom box, so I'll give it a whirl and see how the times come up an regular tasks.

Tried running the anyx86 code on the Pentium mobile, and it choked. Not sure if it's because the Pentium is on the 2.4.31 kernel or if I need to add more flags to make the libraries static or what.....

Hmmm.... I might have to statically link the libraries for you (though you should consider installing the required libraries in the future). Maybe it's time to come up with a separate archive since those files will be so much bigger....

PS Here's the new link. Right now there's only a 32-bit AnyX86 statically linked file. Let me know if you have any further problems.