2D engine done. Advice on optimisation, please?

After merely 2 weeks of work, I now have a very basic openGLES engine running with 100 sprites rotating, optionally billboarding, alpha-ing and the like. So far I've optimised by:

GPU:
- Creating and using texture atlases.
- Using only one draw call per texture atlas.
- Using drawElement to reduce the number of vertices created.

CPU:
- Using matrices to minimise multiplications.

Using sine and cosine tables was a waste of time: no discernable difference.

So, my question is - 100 sprites on screen seems pretty good to me.
Is it worth doing things like the following:

- Reducing data copying by changing all UV and RGB data to unsigned ints (somehow)?
- Aligning data (no idea what that would entail, to be honest. Sounds like it would need a major re-shuffle of my code.

If you are doing 2d then 90% of time you are going to be fill rate limited which means changing your vertex data to use shorts ( using unsigned ints won't buy you anything - the idea is to minimize the size of your vertex buffer) won't give you much.

I am curious. What did you do with drawElements vs drawTriangles and how much did it help? I also have a big waste with my glcolors array. it has to be 2x the size of the vertex array and 90% of the time its all 1s, but you need it for that other 10% Also I am now curious what my render rate is compared to you guys. At some point i will have to do a test. I have about 50 things on screen at hectic times if not more, each thing being 2 triangles. I can hold 60 fps during a lot of the time on a 2nd gen ipod. This is with game logic though which uses a good chunk of cpu. I can do a render test at some point.

kendric Wrote:I am curious. What did you do with drawElements vs drawTriangles and how much did it help? I also have a big waste with my glcolors array. it has to be 2x the size of the vertex array and 90% of the time its all 1s, but you need it for that other 10% Also I am now curious what my render rate is compared to you guys. At some point i will have to do a test. I have about 50 things on screen at hectic times if not more, each thing being 2 triangles. I can hold 60 fps during a lot of the time on a 2nd gen ipod. This is with game logic though which uses a good chunk of cpu. I can do a render test at some point.

I am using unsigned int for my color values within the array.
There is no really way around it if you want to have multiple colors/effects for each quad while keeping them all rendered with a single batch.

warmi Wrote:I am using unsigned int for my color values within the array.
There is no really way around it if you want to have multiple colors/effects for each quad while keeping them all rendered with a single batch.

Having just checked, the size of a float is 4, as is an unsigned int. Is there really any benefit?

Frank C. Wrote:It's fairly painless to change your RGBA data to unsigned byte but non-floating-point UVs require that you scale the texture matrix, which may or may not be worth the effort.

Yeah ...

On the other hand, for my 2d module I let users specify texture coordinates using integer coordinates because most people are more comfortable using pixel coordinates when coding with 2d oriented APIs.

I do the same, but convert to floating point before OpenGL ES gets a hold of them. I hack at the texture matrix for effects and it's just easier to not have to deal with an extra transform at draw time.

As warmi said - from the sounds of things (sprites quarter screensize) most likely you will be fill rate limited, and so tweaking the vertex data isn't going to buy you much performance for a mere 100 sprites. But using unsigned bytes for colors and interleaving the arrays (is that what you ment by aligned?) should be a straightforward change.

I'm surprised that you can get good performance with sprites that size - so I'd say you're doing good