xmame-0.83.1

ROUND 64.... FIGHT!

linux 2.6.7

7-1-2004

GAME

32-bit performance

64-bit performance

%speedup with 64-bit

64-bit with gcc 3.4.0

%speedup with gcc 3.4.0

xmame-0.81.1

5-26-2004

The purpose of this experiment is to quantify performance differences between a 32-bit and 64-bit compile of xmame on a native 64-bit X86-64 (aka AMD64) Linux OS. This is in response to the frequently-updated mame32 benchmarks and the X86-64xmame mailing list thread.

Background: Provided one has a native amd64 OS, a set of 64-bit and 32-bit libraries, and a compiler that allows 32-bit and 64-bit compilation, it is possible to run 64-bit programs on the same machine as 32-bit programs. It is not possible to mix 32-bit assembly code within a 64-bit program, so assembly CPU emulators and the MIPS dynamic recompiler are not being tested here. For this experiment I chose Linux for the 64-bit OS due to its current level of maturity compared to Windows, and I can't stand Windows anyway.

Hardware

Athlon64 FX-53 (2.4 GHz)

ASUS SK8V

2 GB registered ECC RAM (4x512 MB, 3-3-3-8, PC3200) [memtested]

ATI All-in-Wonder Radeon [see below for driver notes]

desktop resolution 1024x768x24

Software

NOTE The All-in-Wonder Radeon (R100) was used instead of a Radeon 9700 Pro (R300) because the open-source (non-ATI) radeon driver IS TERRIBLE. 2D is always slow on R300 regardless of mode. 2D is fast on R100 except in xmame DGA mode. And X loves to lockup regardless of card.

Test Methodology

I vehemently disagree with the mame32 assertion that -ftr 500 is sufficient. 500 isn't even enough frames to get out of the diagnostics in V-unit games. I'm not benchmarking as many games so I use -ftr 10000.

I used a very small perl script to manage the benchmarks. The overhead should be small and consistent across all tests.

A variety of games were chosen. Some to match the mame32 choices, some to match my own old benchmarks, and others for variety. The goal was to stress several CPU types, vector/tile/bitmap graphics, sound, programming styles, scaling, small and large games, old and new games, etc. Consequently there are a few Neo Geo games, which is somewhat redundant.

xmame-0.81.1 was chosen instead of 0.82.1 due to some known broken-ness in 0.82.1.

A regular windowed X11 display mode was used instead of DGA for speed reasons. 140 FPS in pacman is a joke compared to 1500+ and again points to driver problems.

xscreensaver was disabled of course!

The ASM 68000 CPU and MIPS_DRC were not used in any test to make the comparison fair.

Performance Conclusions

The best aspect of X86-64 is the extra registers offset the bloat of the 64-bit extensions, and these numbers demonstrate that only a few games are consistenly worse under the 64-bit xmame. This makes X86-64 one of the few (only?) 64-bit ISAs where 32-bit code is generally slower. (On MIPS 32-bit is preferred for speed.) I consistently see mk2 and umk3 are slower on 64-bit xmame by a small margin, and ga2 is consistently slower by 10%. It would be interesting to see what these drivers do that makes them so slow, and of course, don't write code that way.

Generally older games benefit least from X86-64. The difference is under 10%. Newer games benefit more, generally 8-20%. It's almost like having a "free overclock" relative to a traditional 32-bit PC.

gcc-3.3.3 does not have good K8 pipeline knowledge, nor does it have new features like gcc-3.4.0's -funit-at-a-time (implied by -O2?) or -fweb (implied by -O3?). It's probable gcc 3.4.0 would boost the above scores by a noticeable percentage. Some have speculated 3.4.0 to produce executables 10-15% faster. This is worth testing!

MAME Problems Encountered

During 64-bit compiles there are many warnings about pointers (64-bit) being cast to integers (32-bit) of a different size. It would be good to clean those up.

soldivid crashes in 64-bit

stunrun crashes in 64-bit Still broken in 0.82.1, but please confirm if OK in 0.82u3

biofreak hangs on black screen in 64-bit Seems OK in 0.82.1

gcc 3.3.3 with -O3 -fomit-frame-pointer

GAME

32-bit performance

64-bit performance

%speedup with 64-bit

crusnusa

49.086755

58.936718

20.07 %

dkong

1140.764264

1415.447801

24.08 %

ga2

257.936726

234.239977

-9.19 %

kinst2

5.839336

6.366226

9.02 %

kof2000

366.505529

406.741893

10.98 %

mk2

146.685502

145.081501

-1.09 %

mk

242.281070

248.512936

2.57 %

mslugx

345.958596

371.871160

7.49 %

pacman

1451.273492

1629.660340

12.29 %

pitfight

370.377723

396.107374

6.95 %

punchout

606.254791

643.604846

6.16 %

rastan

664.283351

733.623237

10.44 %

samsho

390.650759

420.713386

7.70 %

soldivid

344.246640

crash

undefined %

souledgb

51.166532

61.042728

19.30 %

ssf2t

361.827581

390.738864

7.99 %

stunrun

181.875532

crash

undefined %

tempest

272.897737

340.184756

24.66 %

umk3

145.880191

143.936515

-1.33 %

wargods

52.638714

60.428721

14.80 %

xmen

420.078615

472.652894

12.52 %

Conclusions

-O3 is not universally better than my -O2 + options settings in 32-bit xmame, but it is a universal win in 64-bit xmame. The result is a somewhat larger performance percentage in some games, like kof2000, which don't tend to vary as much in successive runs like pacman and dkong do.

gcc 3.4.0 compared with gcc 3.3.3

This is to test 3.4.0's alleged 10-15% speedup with -march=k8 and other enhancements. As you can see below, we never achieve this goal.

Conclusions

When using gcc 3.4.0 I never see more than 8.19% improvement on games and generally less than 5%. This deflates the idea of getting 10-15% from -march=k8 and the new options (-funit-at-a-time should be included in -O2 by default). However, I do see a general improvement from 3.4.0 and for most games -O3 is a win, except souledgb surprises with best speed with gcc 3.3.3 -O3. dkong behaves similarly, but that game has a large margin of error between successive runs.