OpenGL ES: Using Shortlike Datatype for UVCoords

Well I am now using short as a Datatype for the Coordinates and Normals of Vertices.
Since "glTexCoords" wants values between 0 and 1 and shorts are integers I was wondering if one could also use a smaller datatype.
I mean with a 2byte-Datatype you could exatly state Textures for a TextureAtlas of 65536x65536 pixels, I just use 512x512!

Well I was just thinking, why do we waste 4 byte of Memory per Vertice by drawing textures as Floats while at the same time we go through all the problems to put our coordinates into OpenGL as shorts.

Realizing you could potentially cut down the amount of data you are sending, to optimize things a bit, is Good Thinkingâ„¢. It'd be nice to know if there is a smaller data type you could use. This is why they wrote a spec.

You can use shorts (or even bytes) but you have to scale the texture matrix to bring the values into range, and you likely won't see a speed up by sending a few hundred fewer bytes or even Kb, but you will add complexity by requiring extra texture scaling.

I can tell you from experience that you'll almost certainly hit pixel fill bottlenecks way before vertex bandwidth becomes a problem (on GLES devices). My advice is to just use a vertex format that's easiest to deal with.

Their experiments agree that because of the Rescaling of the UV-Matrix you just move the Bottleneck around.

But what I don't get now is how to short the normals.
My biggest absolute value in any direction is 16, so since I have 32768 integers in the short-type , I multiply my vertex-coordinate-floats by 2048 and save them as shorts.
Then I have to rescale everything by "glScalef(1.0/2048,1.0/2048,1.0/2048);" so it fits back into the screen.
The Problem now is, that unless I multiply my normals coordinates by 1.0/2048 too, the lightning goes crazy. But since they are already normalized I can't save them as shorts when they are that small.

Still appearently it has been done, anybody sees what I am missing here?

IIRC lighting calculations are done after applying the modelview matrix and before applying the projection matrix. So there is potential that you could pass in ridiculously large normals that are scaled down by the modelview matrix like your vertex coords are.

I know there are (expensive) options for rescaling and renormalizing normals automatically. With those turned off, it seems like you should be able to just rescale your normals and have it work. It seems to me that this should work, but I've certainly never tried it. Though again, this probably isn't going to help your speed much unless you have a lot of vertex data you are passing in or you need to save system memory elsewhere by doing this.

In our last game, we used shorts for the coordinates simply because setting the texture matrix to fit the sprite sheet's grid and using integers was easier than dealing with floats. I figured I might as well use a smaller datatype while I'm at it.

But if you look at the video, you'll see that the FPS are not my main issue, since it takes the biggest amount of calculation to update my Scenes Interleaved Array, not drawing it.

As you'll recognize, I'm only updating the interleaved array when the scene changed, that is when a Unit moved or the player clicked something etc.
Usually that takes about 500ms for the biggest map (you see the world map in the end of the video in my signature) and scaling coordinates for every vertex at the moment I put it into the interleaved Arrays seems to hurt those 500 ms.
I wonder what would happen if I store my geometries as shorts internally, not as floats as I do now. Then I wouldn't have to do any rescaling but all the turning (i.e. multipliing by sin/cos of some angle) and adding (with the center where a certain geometry is to be placed) could probably take more time for short integers (I'd have to floor a lot).

Anyway, just wanted to tell you guys, since I thought it is not quiet clear for you what I do with my engine. I guess most games update their interleaved array at each frame, then again they usually don't use the >50000 triangles per scene that I use

You store the models as floats, but then convert them to shorts every time you want to update what is drawn? For sure I'd stop doing that. Convert them to shorts at load time, then you only have to pay the scaling cost once.

Are you really rotating and moving things around by adding and multiplying by trig functions? I assume that also means that you have a translated/rotated copy of each model's vertexes each time it appears on the screen. If that is true, then before anything else, you really really need to be using matrix transformations to move things around. Then you don't have to update anything when you want to move something, and you only need one copy of each model and can draw it multiple times.

Quote:You store the models as floats, but then convert them to shorts every time you want to update what is drawn? For sure I'd stop doing that. Convert them to shorts at load time, then you only have to pay the scaling cost once.

Well I think I could do that , but floating point operations seem to take longer with integers then floats. So thats conncected to 2:

Quote:Are you really rotating and moving things around by adding and multiplying by trig functions? I assume that also means that you have a translated/rotated copy of each model's vertexes each time it appears on the screen. If that is true, then before anything else, you really really need to be using matrix transformations to move things around. Then you don't have to update anything when you want to move something, and you only need one copy of each model and can draw it multiple times.

Well I have a basic array for each geometry.
When I pile up my interleaved Array I put those geometrys into the interleaved array , all translated and rotated individually, the way they stand on the map.
I guess I could take these vertice-arrays as a vectorlike-datatype and I could mutiply them with some matrices given the proper matrix-operation librarys.
But then again, going through each geometry vertice-like as I do now has the same amount of FPO's.
Maybe you should have a look at the video, so we are not talking about different things.
But maybe i am wrong and there is an OpenGL-Based and graphics-accelerated approach of doing those Translations/Rotations. I only know of the transformation matrices that apply to the whole vertex-array in OpenGL, not to individual parts of it.

Bersaelor Wrote:But maybe i am wrong and there is an OpenGL-Based and graphics-accelerated approach of doing those Translations/Rotations. I only know of the transformation matrices that apply to the whole vertex-array in OpenGL, not to individual parts of it.

Ah, no wonder why you are having performance problems then. Ouch.

It seems as though you've misunderstood how the view matrices work in OpenGL. Every time you draw some geometry with OpenGL it uses the current modelview and projection matrix to transform it using specialized hardware on the GPU.

This means that to draw several copies of the same model you simply set up the matrix transformation by doing translations/rotations etc then draw the model. To draw the model again, simply make a new matrix and draw the model again. This way instead of translating/rotating hundreds of individual vertexes, you just need to make a few matrices and let the GPU do the work at probably 100x as fast as you could on the CPU.

Code:

// save the current matrix transformation
glPushMatrix();

// set up the translation/rotation
glTranslatef(...);
glRotatef(...);

// draw
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

// restore the original matrix transformation
glPopMatrix();

No need to have more than one copy of your vertex data for a model, and no need to transform all of the vertexes using the relatively slow CPU.

Youre proposed way of using a model several times is exactly the way that I am used to OpenGL before I started working on the Iphone.

The whole point of my code being so fast is that I have only ONE glDrawElements-Command per Frame.
This was a suggestion I took from Jeff Lamarche tutorials or Tim Omernick from ngmoco at the CS193p lectures.

I really was doint the translations at first , issuing several glDraw-Calls per Frame. Doing that I could only draw a fraction of the triangles I do now, at a lower speed too.
Each glDraw-Command to OpenGL hurt Performance a lot, so I'm now doing only one and have maximum FPS with 50000 triangles in the scene. For usual sized maps a complete update of the interleaved Array takes about 200ms, and since I do this very few times in the game the player doesn't notice ( it happens when he takes his finger back after a touch).

The point is that I still wanted to optimize it since the project nears its finishing and I thought I could catch some FPO and some Memory here or there

Draw calls are a little expensize, but not that bad unless you are drawing hundreds of objects from our experience on the iPhone. Using VBO's instead of vertex arrays would probably reduce the copying time as well.

If you really want to continue down your current route, you could always do the rotation/translations of the short data using fixed point arithmetic. That would probably help, but you are still going to have a difficult time making the vertex transform fast on the CPU.

VBOs work with GLES 1.1 as far as I can tell, but they don't actually do much if anything on the iPhone (vanilla vertex array perform just as well). Not sure how much they help on GLES 2.

I haven't done a lot of stress testing on the iPhone but sorting by texture and batch drawing vertex arrays (that are transformed on the CPU) has been working fine for me. I always run into fill or texture upload bottlenecks before my vertex pipeline chokes.

I know people are crazy about pushing more triangles but the best optimization on these devices is to simply draw less. The screens are so small you can usually get away with fewer vertexes in general and/or use aggressive LOD techniques.

Backing up Frank on this: My VBO code works on ES 1.1 but has never appeared to actually do anything. Obviously, if you are doing something that requires the performance of VBO, yeah, that will still narrow down your target market significantly so it's probably not worth it anyway on iPhone -- yet.

Also, at least for my 2D stuff, transforming on the CPU doesn't seem to be any different in performance than via the GL, except that batching seems to help, which of course requires CPU transforms. Fill rate and blending are always the real killers for me, not geometry.

The performance of VBOs was bad on 1.1, I don't think there was any performance benefit over vertex arrays.

Regardless, the real core issue here is doing transforms with sin/cos on the CPU. My original post did not come off like that, so I'm removing it. I'm very surprised that any sort of CPU vertex transforms have any speed advantage at all. But, I've been wrong before.

longjumper Wrote:The performance of VBOs was bad on 1.1, I don't think there was any performance benefit over vertex arrays.

Yeah, I never saw any difference in performance either way.

longjumper Wrote:Regardless, the real core issue here is doing transforms with sin/cos on the CPU. My original post did not come off like that, so I'm removing it. I'm very surprised that any sort of CPU vertex transforms have any speed advantage at all. But, I've been wrong before.

The power vr docs recommend batching if you can so that you can avoid all the extra overhead of the GL calls if you're doing a lot of sprites. One advantage of doing your own transforms is that there are ways you can optimize things a bit. For example: depending on what you're doing, you could use look-up tables for sin/cos. Example 2: if you're just doing 2D transforms then you can cut out a significant amount of overhead simply because the GL convenience functions for transformations (glTranlate/Rotate/Scale) are generic and are always 3D. For instance, for a 3D rotation you'll need like maybe 50 multiplies and maybe 20 adds/subtracts (purely off the top of my head so don't take those as hard numbers), whereas a 2D rotation you can do in maybe a dozen multiplies and half a dozen adds.