April 2nd, 2012

as3corelib is the defacto library for all common utility functions in ActionScript 3. While it’s a reasonably good codebase, it’s also unforgivably slow sometimes, which led to the creation of actionjson. Its SHA-1 implementation leaves a lot to be desired, and this article takes us through the steps I took to write new one, to help understand what makes AS3 slow.

Understanding SHA-1

First we need to understand how SHA-1 works. SHA-1 takes a series of bytes, and processes them in chunks of 64 bytes at a time. There’s some extra padding and data added to the input as well. Lots of operations take place during each chunk that ensure even the most insignificant changes have a radical butterfly effect on some variables that are returned in the form of a SHA-1 hash.

Here’s some pseudocode.

add some extra data to the end of the input
set the initial sha-1 values

for each 64-byte chunk do
extend the chunk to 320 bytes of data

perform first set of operations on chunk (x20)
perform second set of operations on chunk (x20)
perform third set of operations on chunk (x20)
perform fourth set of operations on chunk (x20)
end

return sha-1 values as a hash

You don’t need to fully understand it, but just get the general idea. Get chunk, process chunk, move onto next chunk.

Round One

This code isn’t half bad. Everything is reasonably easy to understand and modify. This kind of code is usually how I like to do things at first, before I start optimizing. It’s pretty slow though, roughly a third of the speed of SHA1.hash. If you’re familiar with AS3 optimization you can probably spot a lot of straightforward optimizations.

Reuse the w array, inline the bitshifts

Much better. That made a huge difference and now it’s roughly on par with SHA1.hash. The bitshift function has been inlined, and the w array is being reused instead of recreated during each iteration. Reusing the w array barely has an impact on speed though, it’s the inlining of the bitshift function that accounts for the 3x speed boost. It’s still good to be careful with memory regardless, since passing off work onto the GC isn’t good for performance and is more difficult to profile.

Unroll those loops

There’s not much left to inline that’ll make a big impact, so how about unrolling the loops? I wrote a small Python script to generate the code in the loops, so I don’t have to write it out by hand. Any changes to those lines of code will come from that script from now on. Overall though, this doesn’t do much.

Turn the w array into local variables

That w array has a constant size, so why not just convert each entry to a local variables? It gives us an impressive speed boost. In case you’re wondering why I didn’t convert w to a Vector I would prefer to stay compatible with Flash 9. Vectors are also not as fast as local variables, they only made it 2-3 times faster while local variables made it 3-6 times faster.

Lesson Learned: Array accesses are much more expensive then local variables. Even Vectors (which are typed and pretty fast when fixed) can’t compete with local variables.

Process a ByteArray instead of a String

This trick I learned while making actionjson. Strings are generally slow, so putting data in a ByteArray and processing from there can often be orders of magnitude faster. sha1 gets a nice boost out of this.

Lesson Learned: ByteArrays are faster than Strings for data processing.

Reuse w variables

Looking carefully at the SHA-1 algorithm, it only needs the last 16 w variables. For example if it’s setting the value of w76, the value of w60 is needed, but w59 is not used and will never be used again. Since this pattern repeats itself, the w variables can be reused, and we can reduce the number of w variables from 80 to 16. This doesn’t help much though.

Lesson Learned: Flash is pretty good at optimizing local variables already.

Reuse result of an operation, rather then getting it from the local variable later

Stop shifting around the values stored in a, b, c, d, and e

This one is hard to explain, but basically each time the a-e variables are changed, it’s mostly data being moved around, while only two variables are actually changing. By changing which variables get modified and used after each iteration, we can only modify the ones that need to be changed. This just removes a lot of code that shifts around values (e.g. c = b, a = b). This has surprisingly little impact.

Lesson Learned: Flash is still pretty good at optimizing local variables.

Some misc improvements

I’ve been focusing on the inner SHA-1 loop, since that’s where most of the work is done, but let’s try improving some of the code outside of it. Removing an extra variable doesn’t help much. There is a small boost from inlining and optimizing intToHex. There’s not much of an impact on the “long string” test, since this reduces overhead, which the “long string” test has less of.

Reduce unnecessary conversions

Now let’s get even deeper. Using Apparat‘s dump tool to examine the raw ABC, and you may notice these everywhere…

PushDouble(4.023233417E9)
ConvertUInt()

PushInt(271733878)
ConvertUInt()

What? There are no floats and ints in this code. 4.023233417E9 is 0xEFCDAB89, one of the constants. The other number is one of the uints used. Shouldn’t both these variables be a uint already and not need conversion? It appears that Flash is encoding uints as ints and floats, then converting them to uints. Weird. But since ints/uints in Flash are stored with two’s complement encoding all the uints can be converted to their int equivalent and the operations needed for SHA-1 will behave exactly the same. This dramatically reduces the ConvertUInt’s and ConvertInt’s in our code, and appears to have a pretty big impact. There’s still plenty of ConvertInt’s left, but I’m not sure how to get rid of them all.

Lesson Learned: SWFs have a uint number pool, but may not use it. This can impact uint performance.

Reuse the ByteArray instead of creating a new one

The only thing preventing this code from using a constant amount of memory is that ByteArray. Let’s reuse it instead, so a ByteArray isn’t created each time. This helps GC performance, and also lowers the overhead of our parser.

Lesson Learned: Not creating new objects is almost always a good idea.

Grasping at straws.

Here’s where I start running out of ideas. These changes provide basically no improvement.

Final results

I did all my testing on on Linux with Flash 11.1. I spent enough time writing this article and the SHA-1 implementation that Adobe released a new version of the Flash player (11.2). Here are the final results.

TL;DR

February 6th, 2012

Adobe added native support for JSON in Flash 11, which was released a few months ago. I’ve added a new argument to the blocking JSON functions (decodeJson and encodeJson) that will use native JSON if it is available.

Basically, this allows anyone who wants to get a free speed boost among Flash 11 users, while still staying compatible (and fast) amongst users still using Flash 9 and 10. Read the documentation in encodeJson.as and decodeJson.as for more information on compatibility differences.

As much as I hate to say it, projects targeting Flash 11+ should not use blocking JSON functions. While they are still very fast, AS3 can’t compete with native code, and libraries like actionjson should only be used when necessary. There’s still no equivalent to the asynchronous JSON encoder and decoder, so they’re still useful, although this will also likely change with the release of Actionscript Workers.

January 13th, 2012

As I recently found out, Adobe is dropping support for fast memory opcodes in Flash 11.2, making projects that use tools like Apparat, haXe and Alchemy 0.3 potentially break in upcoming versions of Flash. This is the first time I’ve ever seen Adobe break compatibility intentionally for non-security reasons. It’s a pretty messed up thing to do, considering that many projects use these those opcodes.

For the pre-compiled version of actionjson (actionjson.swc), I used Apparat to provide a small boost in performance. So, if you downloaded it before now, you need to download it again or risk projects breaking in Flash 11.2. I’m very sorry that this happened, this is an unprecedented move, seeing how important compatibility has always been to Flash.

People who used the uncompiled actionjson files are unaffected by this problem, since the master branch of actionjson does not use Apparat. I would recommend making sure they are up to date, since there was a minor bugfix a few months back.

November 23rd, 2011

I’ve been working on a game engine recently, and here are some of my experiences and lessons learned. Despite the title, there are many ways to approach this problem, and this is just the one I took.

So, what’s massively cross-platform? It’s a rejection of the ideology of picking a single toolkit or environment (Flash, Unity, XNA, iOS, Android, etc) to base code in. It’s about making the game itself the model in MVC programming with the controller and view being handled by whatever environment I’m porting it to. Many of these toolkits are cross-platform, but sometimes they have poor performance, limited functionality or don’t support many of the targets. I wanted to support everything and have it perform well across the board, which involves four major areas…

The Desktop. Linux, OSX, and Windows. The easiest to target, due to the ubiquity of free and open-source tools for these platforms.

Mobile. Android and iOS (maybe Windows Mobile 7). More limited in options, and wildly different in some ways, but the basic set of tools are readily available.

The Web. IE, Chrome, Opera, Firefox, Safari. The most unusual of the four targets, because of the limited choice of languages.

Consoles. The Xbox 360 (and/or XNA), PS3, and Wii. Excluding XNA, expensive to target. Still, there’s a lot of similarities between them and the desktop target. I haven’t gotten around to this part yet because it’s expensive, so it’s not covered here.

So, basically I want to write a game engine that can support 9+ wildly different platforms, and have it be pretty easy as well. Turns out it can be done.

Choosing a language (or the core environment)

So, at the core of this game, I want to write game code once that can be shared amongst the different ports. I also wanted the nice warm embrace of a quality scripting language, with minimal impact on speed. Here’s some of the options I went through until I found the right one.

Javascript

The web target is basically the hardest to target, since there’s really only Javascript, or Flash, which is also basically Javascript. I could go the Unity route as well, but a good web developer should avoid requiring plugins whenever possible. There’s also Java applets, but I’ve had lots of problems with applets in the past and they’re not particularly user friendly.

So, why not use Javascript itself and clear up the web target problems easily? I tried finding a portable Javascript runtime but had trouble. Rhino, the Javascript interpreter for Java, seemed plausible for Android. I could probably manage with V8 on the desktop. Initial research suggested I couldn’t use iOS’s Javascript interpreter easily, and V8 wouldn’t meet iOS’s code execution guidelines. This seemed like a minefield of potential problems, plus I had a huge bias, I don’t like Javascript over some of the other possible choices. I decided to look elsewhere first, but ended up never looking back.

PyPy / RPython

At this point I felt if I could get something to compile to C or LLVM bitcode I could make it work. I found a project called emscripten that converts LLVM bitcode to Javascript. Additionally, if this didn’t work there was always Alchemy, which does basically the same thing for Flash.

I started checking out PyPy, or more specifically, RPython. Python being my favorite language to code in, it might be perfect for the job. I could even get PyPy to generate C that seemed vaguely usable. PyPy however seemed to be made solely for creating binaries, not C code or llvm bitcode. Additionally, many cool Python features were not available in RPython, so there was just no way I was going to get the full Python experience. I moved on.

Ruby

Haskell

I tried getting ghc to generate LLVM bitcode, but this was consistently troublesome. It could also generate vanilla C, but this was also difficult. I tried getting ghc to use Alchemy’s tools directly, but they just never worked.

Then… Lua

To me, Lua was a toy language, something that non-programmers used to program. This isn’t true. It ended up being my final choice and proved itself to be a top tier programming language. I was impressed by it quickly, and was confident I could get it onto my desktop, mobile, and console targets with ease. Still, there was the web target, but I found ways around this problem, which I detail below.

Choices I didn’t investigate fully

Lisp. A solid lisp implementation could be easily ported everywhere. I think this would’ve been my choice had I not found Lua.

Javascript. I abandoned this choice pretty early. While I think Lua is a better language to work with for this kind of thing, Javascript still remains a valid possibility.

haXe. Created by Flash demigod Nicolas Cannasse, it could potentially be compiled to every target mentioned. It didn’t fit in well with the manner in which I wanted to develop this game though, and the C++ targeting didn’t seem mature enough, so I looked checked out other options first.

EDIT: playn. This was suggested in the comments, I never tried it out during this project. It does not currently support iOS and Console environments, and relies on Java, but it’s open-source and so it’s possible I could do that myself. Worth investigating.

Porting Lua to everything

Each platform usually had it’s own quirks and needs, so I had to figure out the best way to make Lua work on each of them.

Lua on the Desktop

There were no real problems here. I used Lua 5.1, and it just worked. Eventually I switched to luajit 2. Not because I needed the performance boost, which luajit did give me, but to familiarize myself with luajit’s much more complicated build process so I could use it in other targets. Both are fantastic pieces of software, but I would say only use luajit if speed is very important.

Lua on the Web

I first tried compiling Lua using Alchemy. Lua compiled easily, but some hastily made speed tests placed it at a few hundred operations per second, which is extremely low. I decided to try working with emscripten instead. It was also pretty easy, but my first live test of lua code running via the lua runtime via emscripten via a Javascript interpreter was also extremely slow (EDIT: This may have changed, emscripten now has emcc, a tool which may offer significantly better speeds than what I experienced). It seems obvious in retrospect, but I was hoping for the best. In the end it could barely manage 10 fps, even with rendering turned off.

I still stuck with Lua however, and wrote a Lua->Javascript source code translator called lua.js. This would avoid any speed problems due to Alchemy and emscripten. Javascript turned out to be a good host for translated Lua applications, approaching near-Javascript speeds.

Lua on Android

Originally I used standard Lua which compiled easily for Android. When performance was a problem, and improvements to the rendering had already been made I switched to luajit. Luajit 2 is in beta right now, and for unknown reasons crashed on Android with JIT turned on, but it can be turned off. There was a slight speed boost, but overall the rendering was still the problem so it may not have been necessary. I talk more about that below.

Lua on iOS

I didn’t waste any time here and went straight to luajit. Not much needs to be said about it, although the JIT compiler cannot be used on iOS because of Apple’s code execution guidlines. I have seen some suggestions that this is not true in certain cases, but if it didn’t seem necessary anyway.

Graphics

The easiest path here is to keep the art simple, at least at first, so I decided to make a 2D game. Generally speaking 3D games are more time-consuming and expensive as well. Knowing what I know now, it’s very possible that each target could handle a simple 3D game. For my own sanity though, I kept it 2D. Take a source image, draw it to the screen at a location. That’s it.

Drawing on the Desktop

I first went with SDL 1.2. It’s stable, wildly popular and portable, and also surprisingly slow. It turns out 1.2 is pretty much exclusively a software-rendering system with no vsync. The result was choppy animation that tears, and has a lower framerate than I’d like. I tried SFML, but found the API lacking, and for a while settled on Allegro 5.0.4. Allegro 5.0.4 has a lot of potential, but is rough around the edges, little niceties like the transition to fullscreen on OS X were missing.

I then decided on SDL 1.3, which is still being developed, but I haven’t had any problems. The core set of features I wanted all have worked flawlessly. It basically combined all the nice things about SDL and Allegro, with none of the bad things. Performance improved and the game looked smooth on all platforms.

Drawing on the Web

Originally, I figured Flash was the best option for this, since traditionally it’s been much faster to render in Flash. As I discovered, this changed with the advent of Canvas and HTML5, but I still wanted to support Flash for any users that might not have Canvas available. I tried several different drawing methods (copyPixels, using Bitmaps) but performance was worse than Canvas in every browser I tested, regardless of the method used. Compared to Canvas on Chrome, it was around 4x slower. With some extreme effort, I’m sure Flash could improve, but even still it didn’t think I could ever reach the dizzying highs of 60fps in Chrome. I eventually dropped the Flash target entirely, since it couldn’t meet my standards. I figured letting users play a poorly performing game would give them a bad impression, and soliciting them to upgrade their browsers was actually a better choice.

Drawing on Android

I first used Android’s Canvas, but it was way too slow. Apparently there’s hardware acceleration for Canvas in Android 3+ but I couldn’t see a performance difference when I tried to enable it, and I still wanted to support 2.x if possible. I then wrote my own OpenGL renderer, that mostly relied on glDrawTexfOES to draw images. It was much faster but still too slow.

I managed to find libgdx, and was immediately impressed. The fps doubled immediately compared to my more naive solution. libgdx is so good, I’d use it on the desktop targets if it didn’t require the user to have a Java VM installed.

Drawing on iOS

I was expecting this to be easy since iOS is popular and libgdx left me feeling positive about rendering libraries for mobile platforms, but all the choices on iOS either didn’t fit into my display model or weren’t free. Mostly both. I reluctantly wrote my own OpenGL renderer for iOS, but this time I learned a little bit more about what keeps performance high on mobile devices and relied on a method that used glBufferData and glDrawElements instead. The performance ended up being what I wanted, even on an iPhone 3G.

Audio

Like the art, I needed to keep audio simple. There are event sounds, which play once, and background sounds, which loop forever but can be stopped at any time.

Audio on the Desktop

Originally I planned to use whatever audio system was available with my display library, but after switching around I disabled sound in whatever library I was currently using and looked elsewhere. The first was libao, but it was prohibitively licensed. I investigated a couple alternatives, including PortAudio, until I eventually I found OpenAL. Despite a high learning curve it met all my needs, including some I didn’t think I had. It also favored pushing data over polling data (callback-based audio playback being pretty common), which was great since I wanted event sounds to be as responsive as possible.

OpenAL just plays sounds, it doesn’t decode them, so I embedded libogg and libvorbis, so I could play Ogg Vorbis files. Unlike other formats, using Vorbis doesn’t require me to pay a license. I eventually switched to stb_vorbis though, which is an entire Ogg Vorbis decoder in a single file, because it simplified my build process and appeared to be faster as well.

Audio on the Web

There’s only one real choice here, the HTML5 audio tag. This was also the most worrying, since delays in sound playback can’t really be controlled and I don’t have the option to seek an alternative. Overall though, it seemed to work great across all browsers.

Audio on Android

Audio on iOS

I had some performance issues here when I used AVAudioPlayer, so I wrote an OpenAL version instead. It was better overall, but the game still runs significantly slower during sound playback. This is actually an ongoing problem, so I’d say my next option would be to try a good sound playback library for iOS, since the selection seems a lot better than the rendering libraries for iOS.

November 16th, 2011

I’ve been toying with Lua a lot lately. Lua is in some ways, the ultimate scripting language. It’s simple, effective, and supports a wide range of environments. The only missing environment, in my opinion, is the web itself, so I wrote a tool to convert Lua to Javascript.

Time passed and I kept updating it and fixing bugs, eventually adding support for ActionScript, and finally rewriting the entire thing in Javascript itself. It’s still experimental at this point, but I’ve open-sourced the project and released it onto github.

May 19th, 2010

As a developer, I generally like very fast builds. I only managed to complete my recent port by usingÂ fcshctl to keep me from going insane waiting for the results of my work to show up. At my job however, fcshctl alone doesn’t seem good enough (although, not for lack of trying).

A clever coworker reproduced a cool feature of some recentwebframeworks to make an autobuilder, a system that will automatically build projects when files related to the project change. But today I wondered if automatic Flash builds weren’t nearly fast enough! Turns out they were not.

Using some command line tools that make use of inotify (a Linux-only, file change notification system) I can have automatic builds that spend literally no time waiting to build when files are updated. Install inotify-tools and try this command:

Replace “<watch folders>” and “<build>” of course with the folders than need watching and the build command, respectively. inotifywait pipes changes to the files or folders you specify into the while loop, which then runs the “<build>” command upon each change. Combined with fcshctl, it creates blindingly fast Flash builds.

On a side note, I’ve been using vim lately. Â I’m growing pretty fond of it, but it (and gedit too) create temporary files. You’ll either need to modify the inotify command to ignore these files, make sure they’re not placed in the same location, or disable them entirely as they will sendÂ unwarrantedÂ signals to inotify and trigger premature builds. This could apply to some version control software as well.

February 7th, 2010

I’ve always been annoyed by the hatred of Flash by the development world. I’d prefer to see it hated for real reasons (there are plenty) and replaced by genuinely better technologies, but the hatred comes from people who often don’t care to understand Flash and support poor solutions as the answer. Lately this hatred has been getting louder, and from a PR perspective Flash couldn’t be worse off.

I’ve tried to fully describe why this anti-Flash movement is, in so many ways, wasted energy, but I found another post instead. Here’s the best description of what role Flash plays on the Internet that I’ve read since the first Flash ad pissed off a JavaScript developer.

January 2nd, 2010

I have been using shared objects to share data between swf files, so that changes in one can affect the other. On a website it works great as long as they share the same localPath and name, but this model doesn’t quite apply to AIR applications. Despite the name “shared objects” when you’re using them with AIR they’re private to each application, making sharing data using shared objects impossible as far as I know.

I’m a little surprised each application gets its own shared objects folder. I’m guessing it’s because there’s no domain to keep things properly separated, but I’m disappointed with the alternative. The localPath of a local shared object could default to a path containing the id of the application instead. This would keep local shared objects separate by default while still allowing more freedom. Treating each application as it’s own domain instead of the computer itself is frustrating.Â The best solution I’ve found is to create a shared application storage directory, then save any files to that.

These are the usual application storage directories (I’m not sure about the Mac OS X one, since I can’t test it, but it appears to be correct):

Which is probably the best place to put the shared application storage directory. When creating this folder I would recommend using the same naming convention as your program id (which generally is similar to package names, e.g. “com.somewebsite.project”). It should not be the same id as another application.

I create a duplicate ofÂ File.applicationStorageDirectory because it doesn’t allow you to resolve paths beyond the root of the directory.

It’s important to realize that this is ultimately a hack. If the application directory changes it could save the file somewhere unexpected. Still, it’s unlikely to change in the current version of AIR and it’s the best way I’ve found to share data.

December 29th, 2009

The ports of my first two games are complete. I’ve put both games here. If you play them let me know in the comments if you notice any bugs, or if something doesn’t unlock, or even if you just enjoyed them so I can know they’re working for people.

I was paranoid I would go all George Lucas on the game and so I swore to not change things, but I did end up doing that anyway, so I tried to only change stuff that few would notice. This doesn’t cover changes to the engine (of which there are many), except as it relates to gameplay.

NES style enemy regeneration for some enemies. When some enemies go offscreen, they’ll come back in their original position, rather than their last seen position. It was too hard to do this in the original version so I didn’t bother. Most enemies still keep the original behavior because they don’t work well when you change it.

New preloader. I believe I originally had Wily always running from Mega Man (I thought it was funny), than I started adding more robots, then I had everybody running from everybody else. I got a kick out of it, but whenever I see it now it looks tacky. I rationalized reasoned that a preloader isn’t really the game, so I could do as I like. Even though people’s Internet connections are a lot faster these days, the preloader is really there so the game doesn’t start without the user’s expressed input. Also, since it’s unlikely Flash itself will have focus, it’s also a trick to get the user to click on the game and give it focus. The new preloader also serves yet another purpose…

Controller instructions. I’ve always considered the audience for my games to be the type of people who, at the slightest indication of boredom, will vanish. Looking for the controls is soooo booooring. Now they’re in the preloader and you can even fool around a bit while the game is loading.

New weapon. I’ve replaced the unconscionably boring S.Missle weapon you get when you finish Mega Man vs Metroid with a new one. For the sake of surprise I won’t tell you here, but I love it and I hope it gives my games more replay value. I’ve also made the other unlockable weapon more powerful. In other words, neither unlockable weapons suck anymore.

Better collision detection. The old system was a bit sensitive, due to some sacrifices I made for image quality. In the old days a rendering bug gave images this annoying 1 pixel border thing. My solution to the problem was to give each image a transparent 1 pixel border so even if the rendering bug was there, you could not see it because it would only affect the transparent border. Since I used hitTest for collision detection, that meant it was a little easier to get hit than I intended, by 2 pixels. This always bugged me but not many people seemed to notice and I didn’t have any better ideas at the time.

Samus is a little smarter.Â I noticed some mistakes in Samus’s attack routine where she walks along the ground and shoots you. It’s hard to notice because she usually falls back on a bombing run instead.

Arthur is a little smarter. The way Arthur detected incoming projectiles was a little clunky, so I improved it a bit. I actually went too far and he became the one, so I had to introduce a little chaos to make him less superhuman.

AIR version (WIP). I always wanted to let people download the game, but I couldn’t find an easy/free way to do it. I’m actually stuck on a significant problem, which I’m trying to resolve, so this is still incomplete. When I do get it working you’ll also get…

Fullscreen. Not just bigger too, but the game actually formats to the size of the screen! Widescreen NES. I think it’s pretty cool. This only comes with the future AIR edition, since Flash doesn’t yet allow keyboard input in fullscreen to the degree that the game needs.