GUIMark 3 – Mobile Showdown

Introduction

It’s been exactly one year since GUIMark 2 was created and it seems the natives are growing restless. Over the past few years I have spoken with and worked with a few vendors about performance issues in web technologies. Most of this stuff usually stays pretty internal, but this time I’ve gotten a new request straight from Adobe’s QA team. Build a new version of GUIMark that’s more comprehensive, focused on mobile, and remains open to the community. With 200 test results this is definitely the biggest GUIMark yet.

The philosophy for this benchmark is the same as before. Each test has been designed to find the breaking point of your phone or tablet. By forcing the devices to render at less then 60 frames per second we can ensure that stated performance matches actual throughput on the device. In fact most of these tests were designed to average around 30fps so there would be plenty of room for future growth. Similar to previous tests, I’ve only had the time to build these in Flash and HTML5, however it may be interesting to test native apps at a later date.

For the sake of disclosure, I should state that Adobe funded time through my employer EffectiveUI to enable me to write these tests. While the ideas, code, results, and analysis were conducted entirely by me, I understand that people may read in some bias as a result. Please keep in mind that this was designed to be a 3rd party analysis, not a pat on the back, and I think the results reflect this. Also, if you look at the source code for the two platforms, you’ll see that in most cases the code is line for line identical, only diverging when it comes to platform specific APIs.

Setup

Nine devices across a range of hardware and firmware were used to run the tests. While the devices provided a good sampling of whats available on the market, it is by no means a definitive list. Each device was running the latest software available to it, and in the case of Android represents a moderate amount of fragmentation. While it would be ideal to only test devices with the latest version of Android, the reality of mobile right now demands that we all work against a wide variety of firmwares. Tests on the HoneyComb platform were originally done with the 3.0 version, but with the release of 3.1 at Google I/O I wanted to see how this affected performance. I’ve only listed 3.1 results in the main article below, but the 3.0 results are also preserved in the Google spreadsheet linked at the end.

Lastly, the tests have been designed to force the device into a 480 pixel wide viewport which I feel is a good median resolution for interactive content like a game or chart. They have also been designed to run in portrait mode. The source code for all the tests is contained here, and the directory on my webserver containing all the runnable tests is also available to browse through. Keep in mind that these tests were designed to run on a mobile device, and if you view them in your desktop browser you will likely see them all running at the maximum framerate of 60 fps.

Bitmap Test

First up is a bitmap drawing test that was designed to simulate a scrolling shooter similar to the old Raiden arcade game. The game logic is minimal so this is all about pushing pixels around. Unlike the GUIMark 2 bitmap test, this new version doesn’t include scaling, anti-aliasing, rotation or half pixel compositing. This is just a straight up blitting test, which is more in line with how game developers would optimize their drawing code for handheld games. Like previous tests I’ve done, this one runs on absolute timing for the position of elements in the game. This means slower devices don’t run the test slower, instead the rendering just looks more choppy.
Also, one of the comments on GUIMark 2 was that HTML5 should draw faster when the source image was cached to a separate canvas first, so I’ve included 2 versions of the HTML5 test to investigate this theory.

I actually expected HTML5 to do much better on this test. This is blitting 101 stuff here, no fancy transforms or anti-aliasing, just straight up compositing. Flash on the other hand chews through it without a problem. For the most part HTML does seem to benefit from caching image data to a canvas first and copying pixel data from there to output to the final canvas, although the benefits weren’t universal. The asterisks in the results for the 3 tablets is explained further below.

Vector Test

I think that in terms of ‘real world’ tests the original GUIMark 2 vector test better represents the type of things people will use the vector APIs for, so this time I wanted to do something more fun. This new test is more akin to a Processing demo, something I imagine accompanied by a cool audio tone generator and posted on a site like chromeexperiments.com. It also gives us the chance to compare complex vector fills and gradients that were left off the GUIMark 2 vector test. This test runs off absolute timing just like the bitmap test.

Even with only a handful of shapes on screen at a time, this test is pretty devastating to the drawing APIs on both platforms. You can barely even detect the complexity of the gradient in the HTML5 version on mobile. Without having a desktop browser to validate it with, I would have thought the gradients were completely missing when testing on the phones. Flash manages to keep a pretty sizable lead over HTML on this test. Honestly I’m not surprised by this fact since vector drawing has been the keystone of the Flash runtime since its inception.

Compute Test

Since I’m a graphics geek I’ve resisted doing straight up compute tests like the popular SunSpider before. I tend to be more interested in visual complexity then algorithmic complexity in my day to day work. Despite this I wanted to find a test that would really stress number crunching while still providing a good visual metaphor. This left me with two obvious choices, physics and AI simulations, and ultimately I went with a flocking simulation that proved to be easy to strip down and port to javascript. This test is very heavy on Euclidean Vector math and array iterations, and is an absolute killer on the processor with relatively few boids being simulated. Plus, it allows for a lightweight visual component to validate whether the output is behaving correctly. This is the only test that provides an option to disable rendering to see the performance difference between code execution and rendering.

This test was especially punishing on the Galaxy Tab for some reason, and the general deltas between Flash and HTML5 are larger here then any other test. While AS3 and Javascript are nearly identical in language, I’m guessing what we see here is the real difference between static and dynamic languages. Browser vendors have been putting a lot of effort in to making Javascript as fast as possible, but at the end of the day the biggest limitation for speed gains may be the lack of explicit typing. I was also surprised to see the performance gain just from disabling rendering for this test. Those 100 or so small lines nearly halved the performance in some cases. It really illustrates just how much visuals affect general software performance.

Video Test

Last time I gave up on my attempt to compare video performance because there was no way to retrieve frame rate information from the system. This time around I decided the only way to make this work was with a high speed camera. By encoding the frame data directly into the video and putting it under a high speed camera, we can objectively record how often frame data is being dropped from the render queue. The better the decoding engine, the less we should see frames being dropped.
This test is a bit different from the others for a couple reasons. Video tends to follow standards and decoding chips are designed around those standards. Performance doesn’t scale linearly like standard CPU bound benchmarks, and you’ll reach a point where the decoder hits a brick wall. It’s more important to test those standards than to compare everything against a single heavy stream (which would be more akin to the tests above). With that in mind I’ve created four tests that stick close to YouTube encoding standards, using the following video profiles.

Please Note. The Gingerbread release for Galaxy Tab enabled hardware acceleration for Flash video, while numbers are now near 100% since the update on 5/16, I didn’t have time to rerecord the test and parse the results

Before you ask, yes I actually sat through all of these high speed videos and counted individual frame skips, and it was thoroughly painful. Maybe next time I’ll wise up and write an image analysis program to do it for me. Subjectively, I would argue that video that stayed above 70% looked good during playback. Anything below that mark will have too much stutter and really starts looking like crap.

Flash really takes a beating in this category as many of these devices only allow software decoding for Flash video. You can clearly see which devices are enabled for hardware decoding like the Playbook, Xoom, and Atrix. Adobe has informed me that exposing hardware requires Google and the manufacturer to deliver the appropriate drivers, which becomes evident when viewing the performance differences between Xoom 3.0 and 3.1. HTML5 video on the other hand seems to be fully hardware accelerated on all of the phones, although interestingly HTML5 won’t fall back to a software renderer for certain files, and simply refuses to play the video.

What’s wrong with the tablet results?

Every time I build these tests there’s always some hidden problem that I stumble across that I didn’t expect, and this time is no different. You will notice in the HTML bitmap tests I had to place an asterisk next to the frame rate numbers for three of the tablets, the reason why is because the frame rate reported by the device is extremely inaccurate. With the high speed camera we can see just how far off the numbers really are.

*Note that the Xoom in the video is running 3.0, and while this affects 3.1 as well, I didn’t have time to recapture it on video.

While the tablets showed the most dramatic problems here, I’m pretty sure I saw it manifest on the Atrix as well, just to a lesser degree. The behavior doesn’t seem to exhibit itself on either the vector or compute tests, and none of the Flash tests show this problem either. The Playbook also doesn’t seem to have this problem. My best guess is we’re seeing a problem with WebKit image rendering, with the browser run loop falling out of sync with the GPU somehow. Hopefully someone out there can shed some light on this problem.

Stats Roundup

This test was much bigger then anything I’ve done before, and we’re not done yet. I went back to my old GUIMark 2 tests and ran them as well to provide even more numbers to slice and dice. I think those old tests are still perfectly valid and even show how a couple of the devices have improved since they were first tested.

The results of all the tests are broken down on this Google spreadsheet. GM3 refers to the current tests and GM2 refers to the original GUIMark 2 mobile tests.

The Motorola Atrix clearly stands out for overall performance among the phones. On the tablet side the PlayBook took the lead for Flash performance, and while the Xoom posted the highest numbers for HTML5, the truth is that a few of those tests should have their numbers halved since the device isn’t rendering to the screen at the same rate as the listed fps. In terms of interactive content overall, it’s safe to say that Flash maintains a 2x performance lead over HTML5 on average.

The video side tells a different story. All of the devices are able to chew through the full suite of HTML5 videos with only a few exceptions. Flash however is riding out a transition period in which some devices offer hardware acceleration while others fall back to software decoding.

Final Thoughts

There’s a lot of information to absorb here, and hopefully some of the finer points will be fleshed out in the comments, but here’s a quick summary of my thoughts after working on this test.

1. The Flash VM performs really well on mobile chipsets and I don’t see any evidence here to support the idea that Flash is slow on smartphones and tablets. High end videos are below par at the moment, but the 3.1 release of Honeycomb illustrates that firmware updates are the key to solving this issue.

2. I have a sinking feeling that browser vendors are happy enough with current Canvas 2D performance. The performance deltas between Flash and Canvas are nearly the same as they were a year ago when I released GUIMark 2. Maybe I’m wrong but all I hear about in tech circles is improvements in CSS and SunSpider performance.

3. If you’re going to make a Javascript game, create a Canvas-based sprite sheet of all your assets, the performance boost may only be marginal, but it seems to be worth it. Also be aware of this issue that is causing Webkit to get out of sync with the rendering engine.

4. I wanted to include a Windows 7 phone into this review but the browser couldn’t handle any of these tests. If anyone has access to Blackberry or Palm phones I’d be happy to include them in the spreadsheet as well, just add them to the comments below.

Post navigation

57 thoughts on “GUIMark 3 – Mobile Showdown”

I would be grateful if you would kindly compile the Flash benchmarks to iOS apps using the latest version of air packager so we can see them running on iPhone and iPad. Perhaps for fairness you would need to use phonegap or titanium appcelerator to compile the HTML5 games into an app as well using an embedded WebView control, for fairness. If the SWF performance of the latest version of flex on iOS is as good as anecdotal reports have been recently, the reaction from the game dev community would explode.

Running native comparisons gets tricky, should the html5 stuff be running in phonegap, or should i write a native IOS version? If native, should I use CoreGraphics or OpenGL for the drawing? I’m not sure anyone would be satisfied unless every angle was covered there, and it could get quite complex.

I also ran into some problems with flash not embedding properly, but running the SWFs by themselves didn’t appear to cause any relevant performance disparities.

@HTML5: Clearly you have don’t have sufficient technical background to understand what I’m about to tell you, but I will anyways.

As Sean mentioned in his report, there is one feature that flash has that makes it much more efficient than javascript, static variable types. This feature (new to AS3), makes it so that the programmer can tell the compiler what kind of variable it’s dealing with, to maintain position information on an object you would type:

var posX:float;
var posY:float;

Every time you modify this data, flash can optimize the code because it knows these variables are numbers. In javascript, the same code is:

var posX;
var posY;

Notice you can’t specify that the variables are “floats”, so every time you change these values javascript has to check that they are numbers and not some other data type. This adds significant overhead, and makes it significantly slower.

There’s also the shitty garbage collector, and lack of fast lists (vectors) that but I’m not going there.

Javascript can be as fast, or faster than Flash, but it’s not going to happen anytime soon. Adobe has the advantage that they only have to update their VM to speed up the code, in order for HTML5 to become faster, the HTML5 schema has to be modified, and the browser developers have to revamp their Javascript interpreters.

Video and bitmap rendering speed is irrelevant, both Flash and HTML5 are hardware accelerated, there is no major difference there. The difference is in the CPU usage, and in this case, Flash is clearly the winner. Apple’s just being lame, upsetting their customers and betting on a technology that is clearly inferior and will remain to be so in the foreseeable future.

Paul,
battery consumption can be optimized. Flash is like DirectX a company maintained system with high value to the company. HTML5 is like opengl with many extensions and diverse implementations and many interests.
One company is very agile, konsortiums are usually not agile.
Flash will always beat HTML5 unless Adobe looses interest in Flash.

@Christian: Could you delete all these spam posts? Is there any way to make the comment box require a captcha?

@paul: A real-time application is achieved by processing all data required to display the current state of the game, aka 1 frame, and do this repeatedly, as fast as possible, thus giving the illusion that the player is watching a movie they can control. This is true for most games (except some turn based games), which means the CPU of your mobile or your PC is CONSTANTLY executing code. When you’re playing a game on your mobile, whether it’s on Flash, HTML5, or an app, your CPU is likely running 95 – 100% of it’s potential output.

The fact that Javascript is slower to execute isn’t caused by a lesser impact on your battery, it’s caused by the fact that the interpreter creates more CPU instructions to alter the data in the game (read my previous post). HTML5 and Flash have hardware accelerated rendering, so the battery impact for that is the same, and the CPU usage is ALWAYS near 100% if you’re running a real-time application.

Conclusion: The battery usage is pretty much THE SAME! (except Flash runs 2x as fast)

We’ve been using the GUImark tests for our performance benchmark testing on our Storyboard UI Suite. They were very easy to port since it was just moving JS to Lua and are really nice examples that have helped us tune our performance. We just posted a video showing them running on a Android platform and thought I’d share.

in computer java was mile ahead in the second version java adapted(http://www.jesperjuul.net/ludologist/guimark2-some-html-flash-java-benchmarks)wonder if similar result would occur?in window i had to run in classic to enable java acceleration.if similar result were to be shown html5 and flash are both very slow .but last i checked html5 wasnt gpu accelerated.and flash has some kind of vsync enabled so 60 fps is max you get!(for testing purpose i mean)also html5 is very new and everybody want to adopt it so they ll tweak it to be on par to past solution im sure over time!ecmascript isnt going anywhere anytime soon tho ,a lot use it, and will use it in the futur