Fast Rendering in AIR: Cached SpriteSheet’s

In a previous post I showed how proper use of AIR 3.0′s GPU RenderMode can boost your frameRate by 500% on mobile devices. Here we’ll look at how you can do the same thing, and get even bigger gains with your MovieClip animations. How would you like a 4000% boost in performance!?

Loading chart…

The first time I saw this run I couldn’t believe what I was seeing… my iPad 2 was outperfoming my 3.2 Ghz Quad Core Desktop CPU, by nearly 2x! Welcome to gpuMode for mobile…

If you haven’t read the previous post, it will help clarify the technique we’re using below. To summarize, the basic premise was to use single bitmapData instance, for each type of Asset (Library or Embedded). We’d cache the bitmapData to a static property, and then all instances of that Asset, would share the same bitmapData.

The difference now, is that instead of shareing a single bitmapData, we’re going to share an array of bitmapData’s. Let that sink in, read it again. Ok. And we’re also gonna cache frameLabels so we can have some gotoAndPlay() action

Now, there’s actually a couple ways this can be done. You can use PNG SpriteSheet’s, or you could dynamically cache your MovieClip’s at runtime, by using the draw() and gotoAndStop() API’s. There are pro’s and cons for both approaches, but in the name of simplicity, I’m going to focus on the PNG SpriteSheet approach.

Step 1: MovieClip’s to SpriteSheet’s

The first step involves determining which of your assets will need to become SpriteSheet’s. Anything that is repeated many times, or is rendered constantly on screen, should be made into a SpriteSheet. For items that are only displayed briefly, and only a single instance of them occurs, you can just let the normal Flash rendering engine do it’s job. This is one of the beautiful things about gpu render mode, not everything needs to be cached, you can cheat alot(ie straight embed library animations), as long as you optimize what’s important.

Note: Transforms are extremely cheap on the displayList with this method. So, if you’re just scaling, rotating, or moving, don’t make a spriteSheet for it, just Tween it instead, it’s only a little slower, and you save a ton of memory on the gpu. (Remember when you used copyPixels, and had to pre-cache rotations to make it run really fast? HA!)

Once you’ve decided which Animations you want to accelerate, you’ll need the export them as a PNG Sequence. We’ll use Zoe from gskinner.com to help. Zoe will take a swf, and convert each frame to a png, it will also inspect the timeline for any labels, and save all the data in a JSON file.

The steps to do so are as follows:

Take your animation, and move it into it’s own FLA. Save the fla somewhere in your assets directory, and export the SWF.

Within Zoe, open the SWF you just exported, Zoe should auto-detect the bounds. Click “Export”.

Note: ZOE measures the main timeline to determine how many frames are in your SpriteSheet. It’s ok if you animation is nested in a MovieClip, but make sure to extend the main timeline to match it’s duration.

If everything went smoothly, you now have a JSON and a PNG file within your assets directory. On to step 2!

Step 2: Playback the SpriteSheet’s in Flash, really really fast.

The next step is to load the JSON and PNG Files into flash, and play them back. And, we want to make sure that all instances of a specific animation, share the same spriteSheet in memory, this is what will give us full GPU acceleration.

Next you need a class to take these objects, and figure out how to play them. This is essentially just a matter of analyzing the JSON file from Zoe, and cutting out the big bitmapData into small bitmapData’s. You also need to devise an API to play those frames, swapping the bitmapData each frame, and respecting your basic movieclip api’s.

SpriteSheetClip directly extends Bitmap, and emulates the movieClip API. Without going over the entire class, the core code here is the caching and ripping of the SpriteSheets that are passed in. Notice how I use the JSON data to get frameWidth and frameHeight, and getQualifiedClassname for my unique identifier, after that it’s a simple loop:

Now, using this class, we can make multiple copies of the same Animation, and run them extremely cheaply. You can run 100′s of animations, even on the oldest of Android Devices. On newer devices like iPad 2 or Galaxy Nexus you can push upwards of 500-800 animations at once. Plus scaling, alpha and rotation are all very cheap.

You probably noticed in the code, but for performance reasons, my class will not update itself. So if you call play() nothing will happen! Rather than have a bunch of enterFrame listeners, I put the responsibility of the parent class to call step() on all it’s running children, so a single enter frame handler instead of hundreds.

There’s a bit more to the class in terms of managing frames, so feel free to check it out in the attached source project. Be warned though, it’s a little buggy…. I consider this a sample implementation rather than production code, but do as you will.

Note: In terms of workflow, once setup this is quite good. Zoe remembers all projects and settings, so it takes only about 10 seconds to update a Animation FLA and re-export from Zoe.

Next up let’s run some benchmarks, and see how many of these we can push…

Benchmarks!

In this benchmark I will add as many Animations’s as possible while maintaining 30 fps.

I couldn’t get a good shot of it running on device, so here’s a boring video of what the benchmark looks like on PC:

I compared the SpriteSheetClip with a regular MovieClip, and also a CopyPixel implementation. The results are impressive, over a 40x increase in speed over the stock MovieClip, and up to 10x improvement over CopyPixels. That’s a full order of magnitude faster than the previous ‘fastest’ method of rendering in flash.

Loading chart…

Loading chart…

Loading chart…

Loading chart…

Here you can see that even on older Android devices like the Nexus One, you can actually get pretty great results. 150 animated sprites @ 30fps is enough to make almost any 2D game. And when you look at the newer devices, it really becomes impressive, 735 animated sprite on an iPad 2 @ 30 FPS!?

The full Flash Builder project can be downloaded here, please try it on your own devices and see how it runs. Let me know in the comments.

A word on memory management

One last thing I’d really like to stress is the importance of memory management. Because you’re now pushing things to the GPU, you need to always be conscious of your Memory usage. Once you fill up the gpu’s ram, it forces it to swap textures, and this will absolutely kill your gpu performance if it happens continuously.

This is the one major change you need to make to your thinking. This is the same as you’ll have to do if you eventually switch to Stage3D. Everything that is rendered, is a texture, textures are expensive, refreshing textures is expensive, you need to have a clear understanding of this concept, and be focused on managing your textures (read: bitmapData’s) in a smart manner.

So, ok, we have limited memory, but how limited? I’ve read that the recommended target for iOS is 24mb of texture memory, for a 32bit PNG, that works out to 4096px x 4096px (I believe). So, if you imagine all your bitmapData’s, for the current scene, smushed into one big PNG, they should all be able to fit in 4096 x 4096. Do that and you should run fairly well across iOS devices.

Now, you’ll also found that some older Android devices have even smaller memory allowances. So you want to aim for the lowest possible memory footprint you can.

Always optimize your texture management as much as possible, this is probably the single biggest factor that will affect the smoothness if your rendering. If you’re going to spend time optimizing your app, a great place to spend your time is to minimize your texture footprint. You can do this by using small textures that are tiled or repeated, or even sharing textures across component’s / sprite’s.

One technique I used in SnowBomber was to scale my cache’s according to the device. In order for the game to run on the Nexus One, which has a very weak gpu, I scale my bitmapData down by 50% before caching them. This allowed me to pull off a playably 25fps or so even on the Nexus One, something I firmly believed would be impossible. This was almost too easy, just a simple matrix passed to a draw call, and voila, I had dynamically sized textures at runtime…

32 Comments to “Fast Rendering in AIR: Cached SpriteSheet’s”

Great write-up and benchmark comparison, Shawn! I absolutely love the graphs and demonstration vid. This is exactly how to blit for optimal performance on mobile devices. And the best part is all of those Sprite clips can be controlled and transformed via the ol’ fashion display list, which means you also don’t have to worry about clearing/redrawing the whole display list every frame. Another way to describe this technique is also “Partial Blitting”. The bitmapData = cachedBitmapData gives it the edge over bitmapData = copyPixels (from spritesheet, traditional blitting method).

Perhaps one thing to add about sharing BitmapData, though, is that you can’t make alterations to the bitmap data without affecting all of the instances (you can alter the bitmap instance, however). This probably won’t be a problem for most users, but- in cases where you needed to alter specific parts of the image (such as transforming specific colors to indicate a player team) using copyPixels gives you this flexibility at the cost of performance.

If your project requires detailed animations with alot of frames or requires team-customization, you can save on memory and filespace at the cost of performance (nothing is going to beat GPU cached bitmapdata) and use Bitmap Armatures- or models that are animated by transforming individual body pieces via code or timeline animation. More info: http://www.indieflashblog.com/understanding-gpu-rendering-in-adobe-air-for-mobile.html

Shawn, it’s not mentioned in your tutorial, but I notice in your code that you used StageQuality = StageQuality.LOW for all of your benchmarks. This alone will add a huge boost to performance using bitmaps with GPU render mode (particularly on iPad 1st generation devices) – without losing any image quality! The only downside here is that Bitmaps appear to tween on exact pixels vs. fractional pixels, meaning that very small animated objects moved short distances may appear to animate more jagged.

I’m curious if you would consider adding another benchmark/comparison chart on your post that shows the effect of removing StageQuality.LOW. There is currently a major bug with AIR related to using StageQuality.LOW where TextFields that are anti-aliased for readability will become incorrectly scaled/displaced when added/removed from display list. It would be effective to showcase these benchmarks to Adobe to reinforce the importance of correcting this issue.

[...] this step out yet, but it’s an extension of the class I offered up in Step 3: automatically convert each frame from a MovieClip to cached bitmap data (and store that stuff in the GPU). If I had animations in my most recent games, I would be all [...]

You are creating a BitmapData for each animation frame, which makes a new texture to upload to the GPU for every frame… so what are the SpriteSheets for? In “real” GPU texturing, you would only show the “masked” section of a SpriteSheet as a single animation frame which is very fast and only uses one big texture. But you are creating a huge amount of BitmapDatas. Why should this be any faster than using a MovieClip with a timeline consisting of PNGs?

It’s not uploading a new texture each frame, it will upload 30 textures for a 30 frame animation, and never upload again.

I can’t say what the flash player does behind the scenes, but it works and it works well. This method still outperforms Starling running off a single TextureAtlas.

It’s an interesting idea, movieClip full of png’s, should run fast I would assume…but a weird workflow, export png from one fla, import into another. Seems like it’s faster to just to it the normal way with a spritesheet, let some code take care of the grunt work….

I did mention at the top of the article, another way to do this would just be to run the draw() API on the movieclip.

And an alternative I’ve seen used, which might be a little faster, would be to simply do a graphics.beginBitmapFill(), painting frames of your spritesheet just like you would on the gpu…

I’m just not sure the cost of sampling the frame each time would outweight the benefits of having it cached in an array ready to be assigned. In theory, this is a little less work for the gpu, and a little more for the cpu.

In practice, gpu doesn’t seem to mind lots and lots of small textures…

If I wouldnt scale the frames of the SpriteSheet, woud it be faster to use copyPixels over draw method? I know draw() is much slower than copyPixels when you draw a vector, but is it the same if I would rasterize a bitmap?

Well would be slightly faster to use copyPixels within the main ripping loop, but that only runs once per assets so speed is not really a concern there, that code never runs while your characters are animating.

In terms of rendering each frame, copy pixels is a bad solution becuase it essentially creates a new texture each frame that must be uploaded to the gpu.

It is because my game a character has about 120 frames of animation, and it has 20 characters. Using the draw method to draw a vector of only 3 characters takes 15 seconds on iPad, so yes, speed is a concern there.

But as I said, it draws a vector, I’d like to know how much slower is the draw method to draw a bitmap over copyPixels.

[...] The first question to ask is whether to use GPU or CPU mode. There are a lot of info about this already. Basically to me, if you have a lot of animations, consider using GPU, shared bitmapdata and spritesheets. My game has 14 types of customers, up to 8 working family members, 5 different room types, 12 amenities plus a bunch of menus, texts and icons. It was difficult trying to get it to work for mobile but I found a method which worked quite nicely: Spritesheetclip [...]

This and the previous part are the most helpul articles i read as i am facing framerate issues with my new mobile app for kids. I want to thank you for sharing all this excellent info and results. I would like to ask your opinion on something i specific to my project –

I have a rather large scene – 3 iphone screens wide an 2 screens in height. The scene is draggable and can be panned around to view different part of the scene. There are animals walking all over. I am using gpu mode as it is giving the best performance as i have lots of animations running in the scene.

Q : how can i reduce the performance impact by not rendering the part of the scene which is not visible (at a time only one screen is visible while scrolling). will setting visible = false on the sprites which are not in the scene and dynamically setting it to true as they come into the scene help for gpu mode?

Hi, i have a game already made using cs pro5.5 and flash develop. I have tried over and over to get this to work even using your files but am getting all sorts of errors. Once i get past 1 i hit another. I have the spritesheet, i have uploaded it into cspro and given it as linkage but i just can’t get it to animate through the sprites that i need. As a general rule i make myself a template then just chop and change, ie different sheet etc but this has got me stumped (though i admit to being new to flash). Any advice would be welcome thanks

” you could dynamically cache your MovieClip’s at runtime, by using the draw() and gotoAndStop() API’s”

hey all,
that is what i am aiming for. my concern is looping through the movieclip’s frames. If I have these lines of code:
mc.gotoAndStop(i);
bitmapData.draw(mc);

I can’t be sure the frame got “constructed” before being drawn, and my tests with an Android tablet prove this right – sometimes frames aren’t drawn.

This mc is off the display list obviously (we dont need to render to the screen). So is there a way to make sure the frame has been built before drawing it to a bitmapdata? (and waiting for FRAME_CONSTRUCTED, EXIT_FRAME, etc.. is obviously slow and unneeded)

This is awesome! Thanks so much! I was beginning to think there wasn’t a good way to use traditional texture atlases on mobile.

One issue I’m running into:

I’m using the cached bitmap frames as background tiles for a scrolling background game. Whereas before I’d have drawn each tile with copyPixels to one large background canvas (and then scrolled the canvas), now my canvas is composed of a grid of sprites which get their respective bitmapData values on-the-fly as cached frames.

This seems to work fine with the exception of some visible seams which appear between the tiles. The seams appear every 4 tiles or so and get slightly better (hairline) if I set the resolution to ‘high’ in the export settings– that said they’re still pretty noticeable. I’ve tried forcing the bitmapData smoothing property to true after assigning the tiles and still no dice.

Have you run into anything like this? How would you go about stitching a few frames together to form a ‘seamless’ texture?

I haven’t tried this myself, but some things you could try:
1. Check PixelSnapping property, and see if this helps one way or another.
2. Try only scrolling your bg on whole pixels: bg.x = xPosition|0;
3. Try overlapping all edges by 1px (if the art allows it)