Having issues with performance on your smartphone game? In this technical article, MoMinis programmer Itay Duvdevani explains how he and his colleagues solved what should have been a very simple sorting problem creatively, and upgraded their smartphone game engine in the process.

Working on the phone side of the MoMinis Platform is an interesting job. Since our content developers work mostly on the Simulator for game mechanics, bugs and performance tuning -- we always get a little nervous when a title becomes ready to start testing on actual phones.

Compilation of a game can reveal bugs (either in the simulator or the compiler) that will break the game's logic, or performance problems that aren't simulated in the Simulator correctly.

When a game works in the Simulator, but not on a phone, it usually finds its way to my team pretty quickly -- As compiler developers, we're the only ones that can pinpoint the problem.

The Problem with Jelly Jump

Our content developers were working on a new title -- Jelly Jump. After several weeks of development, and a week or so from release, the game became mature enough to start testing on actual phones. On higher-end phones, the game ran smoothly and was actually very fun. However, on mid-range and low-end devices, we experienced random frame rate jitters that made the game unplayable. My team got to handle the situation.

We observed the problem on a relatively capable device (800MHz Snapdragon with Adreno 200 GPU). This wasn't a low FPS problem, but FPS jitter that occurred every few seconds, in what looked like random occurrences. The game would run at 40-50 FPS, then dip to 10-5 FPS for a second or so, and then continue.

My team was tipped off by the content developers about some testing they'd done, which showed that removal of graphics assets made the game run better.

At compile time, the asset-generation stage is responsible for distributing the game's assets to texture atlases. Since this is an automatic process, it uses several heuristics to predict which objects will be drawn together, to minimize texture swapping. These heuristics are currently pretty basic, and rely on static analysis only -- we don't have performance-guided optimization facilities as of today. We suspected the assets were being grouped in an inefficient manner, causing texture thrashing due to memory exhaustion every time certain objects were being drawn.

This was a problem. We had no "magic solution" or even a workaround for this -- we didn't even have a quick 'n' dirty solution. The game simply had too many graphics for our asset-generation pipeline.

After spending a few days figuring out how to tackle the problem, yielding nothing practical for the time we had -- we started doubting our assumptions, and tested whether there even was a problem with the textures. We configured the asset-generation pipeline to downscale all the graphics to a quarter of their size, making all the game's assets fit in a single texture. To our surprise, the game ran just as bad (and was very ugly).

That meant that the game's logic was the bottleneck. We tried profiling the game with the Dalvik profiler. Unfortunately, profilers are very useful when you debug a low-FPS problem, but almost useless when you debug an FPS jitter problem, unless you can predict exactly when your game is going to jitter -- and of course, if we knew that, we probably would have known what the problem was.

Sitting down with the content-developers, hearing about the game's logic -- and using our intuition to plant some debug prints in the code, we came to a surprising conclusion -- our Z-ordering code was misbehaving when a large amount of object were being created in a single logical iteration!

Why Keep Things in Z-Order?

It should come as no surprise that some of the tasks our game-engine is responsible for, except driving the game's compiled logic, is to draw pretty things to the screen, and respond to touch input.

For a 2D engine like the MoMinis Platform, objects need to be arranged according to their Z-order and traversed (either in a back-to-front or front-to-back order), to decide which objects receive touch input, and which get drawn first.

The Touch Handler

Our touch handler is a simple adapter which expects screen-level touch events from the OS, and dispatches game-level events -- so when I click at a certain location on the screen, the object at that location receives a logical touch-down-event instead of just "The user clicked at: (x,y)".

We provide this layer of abstraction to save game developers the need to handle low level touch events and decide which object should respond. It is very important that our development platform be kept suitable for entry-level developers, and be easy to use with as little hassle as possible.

The case where game objects are stacked on one another and a touch event is received at a location that's within multiple objects can be difficult to handle at the game developer's level. That's why we decided that the touch event in this case is dispatched to the top-most object that's visible under the finger, and that's it.

The easiest way to do it is to scan all the objects in a front-to-back order and find the first that can handle the event.

The Render Queue

The MoMinis game-engine is powered by OpenGL ES on both Android and iPhone. For portability reasons, we limit ourselves to the OpenGL ES 1.1 API.

To maximize fill rate, we separate objects into two groups: opaque and translucent. The opaque group gets drawn first, front-to-back -- so fragments that are eventually invisible will be culled by the Z-buffer instead of being drawn-upon later, saving the entire rasterization pipeline.

For correct alpha-blending, the translucent group must be rendered back-to-front, after the opaque group has been rendered.

Challenges to Platform Developers

The demands from our cross-platform development environment and the games it produces are challenging:

Inexperienced, non-programmer developers should be able to create nice games easily

Games should have good performance

Games should be pretty

Final game size should be small

Games should be portable

Developer should be bothered with porting issues as little as possible (preferably none at all)

These demands conflict most of the time, and make our game-engine development a challenge.

When you're a content developer making a fully-fledged game, the game's logic can get quite complex pretty quickly. You have all these power ups and bonuses and special animations and sounds that change your basic game mechanics temporarily, and in some games you have to dynamically generate the level as the user advances (as in Jelly Jump).

Our platform developers aren't always seasoned programmers, or simply don't know the ins-and-outs of the game engine's implementation, so game logic is at times written in a suboptimal fashion. In addition, as the platform is still evolving, common practices used by content developers may worsen the situation, usually with no real alternative due to lack of features in the platform.

This means that sometimes a content developer will create and destroy many auxiliary objects in a short time during gameplay, making instance management a challenge.

An Always-Sorted Collection Problem

As the engine was originally written for J2ME, it allowed use of only the facilities offered by CLDC under a limited number of classes and code size, and some implementations are very far from optimal. For instance, the code relied on java.util.Vector -- a class that has all its methods synchronized.

The entity that manages the game held all existing object instances in an array sorted by Z-order. Every time an object was being created, destroyed, or its Z-order changed -- we would remove it from the array (moving all the elements after it one index back) and re-insert it at the correct location after a binary search (moving all elements after it one place forward).

Since scanning the array for the object is an O(n) operation, removing and inserting an element is again an O(n) operation, and finding the correct spot is a O(log n) operation -- all the basic actions on an object had runtime proportional to the number of objects existing at the moment.

Needless to say, as games got more complex and required more objects, this became a problem. Jelly Jump was creating 40-60 objects every few seconds, in a world containing 500-700 objects. On top of that, when an object is created, its Z-order is set to the maximal Z-order allowed -- but then immediately set to its layer's Z-order by the object's constructor, doubling the amount of Z-order operation the engine had to perform. This would explain the jitter -- creating objects caused a very long logical iteration, dipping FPS temporarily.

Initial Optimization Experiments

We needed to fix it, and we needed to fix it fast. This was the main problem blocking the release.

At first we took a conservative approach, trying to make as few changes as possible so close to a release. We concluded that there was no reason for the collection to be kept in order at all times -- just before rendering and touch-handling. Our first attempt was to defer sorting to these two occasions.

This should have addressed the object creation problem, reducing it from O(n) to O(1), as we intended to simply append the new object to the end of the list, and query it for its Z-order just before sorting. This would also cut down by half Z-order changes, as we were no longer adding new object to the collection, then moving it like we used to.

We took the standard approach, and tried good-ol' quicksort for the just-in-time array sorting. Unfortunately, the game engine's contract with the developer is such that objects at the same Z-order are sub-ordered by the order their Z-value changed -- meaning our sorting algorithm had to be stable, which quicksort isn't. (It took us a good while to figure that this was what's causing the weird phenomenon we were seeing with the game's graphics).

Somewhat humbled by our reckless arrogance in thinking this was such an easy problem to fix, we looked for a good sorting algorithm that is both stable and efficient. Tree sort was the next thing we tried. Though the sort was now stable, it was also very slow. We failed at identifying that we were using tree sort on a mostly-sorted collection, which is tree sort's Achilles Heel -- in that case you end up with a degenerate tree and efficiency is no more.

Reading a little bit on the web we encountered all sorts of sorting algorithms and variations on existing algorithms that might or might not work, but we didn't have the time to start learning and understanding each one and decide whether it was appropriate for our needs.

So we went back to quicksort. This time we applied a bias to the Z-order, giving a monotonic index for each object that got its Z-order changed. This wasn't perfect, but it was reasonable for the time we had.

Eventually, this optimization didn't make it to the release -- we didn't have the time to properly test the change and we were able to work around the problem enough to make the game releasable by avoiding creation of multiple objects in a single logical iteration, thus not triggering the Z-order overhead. Instead of creating a full screen of jellies at once, we instructed the content developers to create a screen line-by-line.