We’re in crazy pre-Holiday mode at Four Door Lemon right now, we’ve got a couple of exciting things to mention before getting onto the post!

QuizQuizQuiz our popular trivia app is currently FREE for the rest of the weekend – please download it now and tell your friends

Also Cricket Captain which is the source of the research for this article was approved tonight and is now available – if you like Cricket this is the management game to have – including 3d highlights and a great simulation engine.

Compiler flags

I’ve worked on pretty much every game platform compiler setup in my time in the industry, for some platforms that actually covers working with 3-4 different variations while the teams found the best code generation for their device.

There are always discussions between developers on what *the* best set of compiler flags are to use for optimum speed of your game (which other than executable size for the more RAM limited devices) is the main goal of development work. Normally the gains from tweaking with these can be very small but I remember features like LTCG on Xbox 1 being incredibly big in terms of the speed up.

We recently submitted Cricket Captain to the App Store and other than -O3 and -Os I didn’t actually do any testing on the various flags for the compilation, the game isn’t quite as optimised as I’d like it to be yet although it plays fine and the 3d highlights run at around 30fps – we will be improving this through actual coding work (and talking about the techniques used for this).

It seemed ideal to combine my investigation into the XCode GCC compilers optimisation flags with writing a blog post so I used Cricket Captain as the test sample.

So a couple of notes on the tests

The random number generator has been forced to generate the same logic each run

The game mode / teams are identical so the same logic will occur (I also verified the results came out the same)

The phone was in the same state for all tests

There will be some variance due to background processes most likely

This isn’t intended to be super accurate but to see if we’d get a noticeable difference from changing these flags

Our render and matrix library are compiled using -O3 into static libs and are not part of this test, I wanted to test it on game code on the basis that people would possibly not have access to middleware sourcecode.

Let’s have a look at the results

Compiler / Flags

5x Bowls time (ms)

3x 3d highlights FPS average

GCC -Os

1704

24.68666667

GCC -O1

1750

24.52

GCC -O1 Auto Vectorisation enabled

1677

24.96

GCC -O2

1692

24.81

GCC -O3

1718

24.85

GCC -O3 Auto Vectorisation enabled

1710

24.67

GCC -O3 FastFP

1713

24.74

GCC -O3 Compile for Thumb

1706

24.80

GCC (Max) -O3, Unroll Loops, Auto Vectorisation enabled, FastFP

1727

24.78

Above we’re showing the flags that we tried, the total in milliseconds of the first 5 bowl calculations (this system actually runs through the movement and animation of all the fielders / batsman to calculate the actual output – note the first bowl is pretty slow in a game so is a fair part of this time possibly skewing results a little). The final column shows the average FPS across 3 3d highlights that last 5.5 – 8 seconds each.

For both tests we see that -O1 with auto vectorisation comes out on top, the values however are very close and the -O3 we shipped with was fairly good. The biggest surprise for me was that compile for thumb didn’t produce shocking results, I’m not really sure if that can be correct? As mentioned above our middleware was still in ARM so we’re not paying a penalty for it on some of the heavy lifting code.

In summary though there isn’t a huge difference between the flags and as I originally thought it’s probably not worth a huge amount of optimisation.

LLVM GCC

I have however been playing with the LLVM – GCC compiler in XCode 4 and despite a few issues getting 4.2.1 debugging to work correctly with it got the same tests done (running on the same device). LLVM – GCC uses the GCC frontend to parse source and the LLVM backend to optimise and generate the actual executable code. From what I read 33% performance increases were expected at runtime.

I did a couple of tests with various flags

Compiler / Flags

5x Bowls time (ms)

3x 3d highlights FPS average

LLVM -O3 AutoVec, Unroll Loops, LinkTimeOpt

874

27.43

LLVM -Os AutoVec, Unroll Loops, LinkTimeOpt

906

27.43667

LLVM -Os AutoVec, Unroll Loops

907

26.66667

Wow.. So that’s roughly twice as fast on the processing and a nice FPS / frametime boost. I imagine compiling our middleware with LLVM would result in even better speeds as that’s where the highlight time will be mainly going (which is partially hinted at by the fact that disabling link-time optimisation only changed the highlight speed).

After the GCC flag results I was worried about how interesting this post would be, the LLVM-GCC results are really exciting though and I think XCode 4 is something everyone will be looking forward to!

That’s cool data on LLVM. I’m sort of a newb at compiler flags and compilers in general. You should consider doing an intro post to the different compiler flags and what they mean and how they work. For example, you mention the -Os compiler flag, but I have no idea really what that means. And I think I know where to type it in in XCode, but I’m not confident enough to ship with messing around with compiler flags.