Uh yeah, don't let it sleep, I haven't figured out the lifecycle management part yet... for one thing I can't figure out what to do about the fact that it seems that Android destroys the EGL context whever your app is pushed in the background. That can't be right, surely. Smithers! More work at the labs.

I think I've removed virtually all the GC in there - I think the only bit of garbage being created is actual Crabs (not very many!) and a couple of String.valueOfs() to show the crab count and fps, every frame. Not much can be done about that. I was hoping GC wouldn't be noticeable at all on 2.3 (which has an incremental GC), and especially not on 2.3 with dualcores.

New version uploaded - this time with transparency, scaling and rotation. I just looked at it on a vanilla HTC Desire and strangely the framerate was only very slightly affected by all those transparent sprites. This leads me to believe that there is something else amiss with the circle rendering that's punting it into some sort of software emulation.

There are two suspicions:firstly, I'm using GL_SRC_ALPHA, GL_ONE as the blend equation for the circles (your typical "glowy particle effect"), the sprites use GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, a relatively normal sprite blending mode.

secondly, I'm using slightly odd texture coordinates for the circles, emulating GL_TEXTURE_1D which isn't available in OpenGLES for some reason.

Waaaaait a minute... n00b error alert! I put the fps counter in the logic thread not the screen update thread - doh. Now my Galaxy is only managing a more realistic 600 crabs at 60fps before it starts to permanently waver.

Well, that's the strange thing... the logic is running at 60fps but the drawing is occuring at the rate shown - that is, render() is only being called that many times per second. Yet on phones it still looks incredibly smooth, even down to about 20fps. Which I suppose is good news - I think I'm going to cap the game logic and update rate at 30fps in order to cope with Desires and the like which are still very common phones, and it'll still cope and feel smooth enough down to about 15fps by the looks of things.

With 1 pixel opaque sprites - the bare minimum you'll agree, and not enough to be troubled by fillrate, I eventually drop down to 15fps with 2300 or so sprites. That's about 35 sprites per millisecond, which just strikes me as really poor And this is purely the "rendering" part as well - this is just GL commands being issued, no sorting etc. going on here, apart from the tiny overhead to draw the FPS counter stuff. Targeting a phone that's not quite as powerful as the Galaxy S II - say, something around 800MHz versus its 1.2GHz, perhaps - we'd probably be looking at 2/3rds of that - maybe 22 sprites per millisecond.

And that's not even thinking about the fact that we're not actually counting the game logic or sprite engine overhead here On a single-core phone maybe we'd only get half the performance, like 11 sprites per millisecond! To get a half decent framerate (30fps being precisely half-decent) on a single core 800MHz phone means we'd be looking at using a paltry 350-odd sprites per frame! You might not think that's such a big deal but again, that's with no game logic at all to speak of. Even our simple little space invaders game Titan Attacks uses between 500-1000 sprites per frame and that doesn't include non-sprite rendering like special effects and score displays. Droid Assault uses between 1500-5000 sprites per frame!

I tried upping the size of the sprites to see what difference fill rate made to the Galaxy and remarkably even increasing the sprites to 32x32 pixels had absolutely no effect on framerate, leading me to think that the hardware rasteriser is proper fast.

So: I've uploaded a new .apk* with no crabs, just tiny pixels. At what point do you reach a pretty consistent 15fps?

Cas

* sorry about the size - in the middle of working around an Android asset problem

Well, I am currently very suspicious anyway. According to the ARM website, the GPU inside the Galaxy should manage 30m triangles / second (in admittedly what is probably the most contrived test case possible of a single triangle fan with 30m triangles in it). If every single triangle was discrete we might reasonably assume that to end up being 10m triangles/second, or 30m vertices/second. The sprite engine draws each sprite as a pair of triangles with two shared vertices so we should be using up approximately 4 vertices per sprite, and so the theoretical simple throughput should be about 7.5m sprites per second, or 7500 sprites per millisecond. And yet I calculate I'm getting just 35 sprites per millisecond.

Clearly something is completely amiss, for the specs to tell me I should be managing 200x more sprites.

To clarify further: this is literally a single call to glDrawElements with one single batch of GL_TRIANGLES, where each triangle is a single pixel, drawn opaquely. Nothing else. And yet I'm 200x slower than I was expecting. Or have I got something seriously wrong with my maths?

Aha! Bad news! For me, anyway. Just after I went to bed last night (bah, typical) I realised of course that the answer was staring me in the face. It couldn't possibly be the single call to glDrawElements that was slow. And indeed it wasn't, when I commented it out. In fact, the call to glDrawElements is so fast it doesn't even register. Unfortunately it's my sprite engine that's slowing things down. Now I'm going to get to grips with Android profiling and find out what bit's slowest. This is going to be no mean feat - I think I need a factor of 10 speedup :S

My suspicions are confirmed. Turns out sorting takes almost no time at all. Remove the buffer write though, and I get 11,000 sprites before it drops to 15fps. If I remove the sprite transform/scale/rotate part I get 12,500 sprites, but there's not a lot I can really do about that code so it's not going to be optimisable any further. So: mere writing the vertex data is giving me a 5x slowdown, which is very troublesome and suspicious. That is, after all, only 300kb of data in a frame for 2,300 sprites.

(You've seen it before, and I know it's maybe not as efficient as it could be - just surprised at how inefficient it is. First plan: write everything to an int[] using FloatToIntBits then blat the entire int[] to the bytebuffer. Suspect that might be the best way for Android. Have to do int[] because of a colossal Android performance snafu when using FloatBuffer bulk puts.

1. See how my FastMath.sin/cos speeds up that rotation part, although you're probably not rotating right now. Remove the double. You can reduce the memory footprint of the lookuptable with the static SIN_BITS variable.

2. Only use indexed put/get on buffers. Threat a buffer like an array: do your own managing of indices.

Ok, by using an int[] array to build vertex data, and putting all the floats into it using Float.floatToRawIntBits(), then copying all the vertex data in one go to the direct buffer. I managed to double performance: got it up to 4600 sprites @ 15fps. With your FastMath.sinDeg/cosDeg methods, I managed to eke a further 10% out and got it up to 5200 sprites @ 15fps.

So that's about 2x faster basically. It would help now to get it about 5x faster. I'm just going to try absolute puts into a direct IntBuffer and avoid the int[] array copy and see how that improves things...

It is indeed a big method but unfortunately Dalvik doesn't do any useful inlining, so method calls are to be avoided.

Just tried directly writing to the IntBuffer using absolute put - much slower than writing to int[] array and copying it all at the end. So there we have it: got about 5000 sprites @ 15fps, or, 1250 or so at a glassy smooth 60fps on the Galaxy 2. I suppose that's livable with if I curb my expectations a little and make sure Chaz doesn't go overboard with the particle effects. I think in reality I really need another 2x speedup and as I say, probably not much chance of that happening without proper inlining, more buffer access "intrinsification", bounds check hoisting, peephole optimisation, etc. in Dalvik, for which I won't be holding my breath.

Anyway: latest version, once again with actual crabs, is here. One tap makes 100 crabs. Lifecycle still buggered, turned the music off though.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org