Thing is, we can't get the fps above about 15fps no matter what we do. I've heard there are fill rate issues, but we've pared our scene down to this list of bullet points.

We're all experienced engine tech people familiar with GL ES generally, so don't spare the horses with suggestions. Many gallons of beer to anyone who can suggest a magic bullet.

The temp test scene is as follows:

*Model has 8 batches, most of which cover the screen but not over each other so overdraw is neglible (Grass, road, cliff edge etc)
*No alpha blending at all. Simple lambert directional light - the grass itself isn't even dynamically lit - its precalced into the verts
*Four shaders, sorted, none of which are excessive
*About 30,000 vertices in total
*No shadows

The stuff we have tried in the engine is as follows:

*Every shader has minimal precision possible on decls etc
*All state setting is cached by us
*All textures are fairly small now, square, pow2, pvr'd, mipped
*Write directly to the back buffer
*Final blend is set to glDisable for now, alpha stuff just thrown out for now
*Using compressed format vertex decl of 32 bytes apiece (Got to 24 once but made almost no difference)
*Rendering from index buffers, not strips. (8 bit indices slowed down a bit vs using shorts)
*Doing nothing "creative" with the depth buffer or anything
*No post-processing. Hell, the engine is in pieces and doing almost nothing!
*We're not even normalising the normal in our per-pixel lighting
*Specular maths disabled for now

The CPU is pretty much flat lined according to Instruments and Shark, backed up by a release build not noticeably increasing fps from it's shocking 15.

No "magic bullet", but if I were in your shoes, I'd do some simple detective work to see where the problem is coming from...

1) If your render function is reduced to only glClear(), are you still getting only 15 FPS? If so, you almost certainly have a timing bug or some logic error in the way you've organized your render cycle.

2) If you don't draw anything at all but run all of the state management/shader setup/vertex buffer code is the FPS still just 15? If so, it might be that you are doing some extraordinarily expensive non-drawing operations.

3) If you disable the shaders but draw all the geometry, does the FPS increase? If so, the problem could be a shader executing too slowly. I don't know about the iPad, but on some graphics cards exceeding some hardware limits makes your shader run in software, which absolutely kills performance.

4) If you keep all of your shaders and state changes, but draw just one object per category/shader/state, does the FPS increase? If so, you're probably fillrate or geometry bound, and further tests can tell you where the problem is.

Looks like you've hit a lot of points already. mattz has some really great suggestions. I assume you have compile for thumb turned off. Is there an OpenGL query hidden in there somewhere, causing a round-trip and stalling the pipeline?

(Nov 11, 2010 07:29 PM)mattz Wrote: No "magic bullet", but if I were in your shoes, I'd do some simple detective work to see where the problem is coming from...

1) If your render function is reduced to only glClear(), are you still getting only 15 FPS? If so, you almost certainly have a timing bug or some logic error in the way you've organized your render cycle.

2) If you don't draw anything at all but run all of the state management/shader setup/vertex buffer code is the FPS still just 15? If so, it might be that you are doing some extraordinarily expensive non-drawing operations.

3) If you disable the shaders but draw all the geometry, does the FPS increase? If so, the problem could be a shader executing too slowly. I don't know about the iPad, but on some graphics cards exceeding some hardware limits makes your shader run in software, which absolutely kills performance.

4) If you keep all of your shaders and state changes, but draw just one object per category/shader/state, does the FPS increase? If so, you're probably fillrate or geometry bound, and further tests can tell you where the problem is.

Best of luck!

Thanks for the suggestions. To answer your points:

1) It goes up to 60.
2) I think it went up, will double check
3) We've not actually tried that. Will do so in a bit, thanks.
4) We went down to just the first triangle. Little change

I really don't think its fill rate. We took a lot of time to ensure that the ground mesh looks pretty much like a single extrusion. The tanks and stuff over the top aren't massive but lets face it its not like I can take them out.

@Jake: Hmmm, I think so but I'll double-check that also. We claimed to be decent at GL but our mac and iPad SDK experience is still fairly light - we could be making any number of mistakes like this tbh. Any other more global things you can think of?

That only matters on old arm6 devices. Apple recommends turning thumb on for arm7 binaries (they use some fancy pants thumb2 that handles floating point).

Ah, I missed that. Good thing we have forums to discuss these things!

Quote:Commenting out only glDrawElements raises the FPS to 60, but I don't think that proves anything as surely all the setup is done lazily anyway?

Well, it proves that it's not a logic bottleneck, but rather it is indeed related to the GL usage.

I don't know how much it'd help, but you could also try ordering your verts in triangle strip order. The docs say it helps cache hits which improves performance. I've tried it myself but didn't see any improvement. There is a good triangle stripper in the oolong engine if you haven't already tried it.

That was 30k verts, meaning ~10k triangles, correct?

I think I'd focus in on the shaders next myself and see what happens if I simplify everything to bare minimums.

Its nearer 20,000 triangles due to the way the level is tesselated. The file is created in a PC tool which calls the D3D optimiser thingy to sort the indices out for better post transform performance.

We've simplified everything to bare minimums today. The level geom only, with a single small texture, shot up to 40fps. A decent number but only because it's doing nothing useful.

After another whole day with two of us trying stuff out, I'm starting to feel that the iPad just isn't fit for purpose as a gaming platform. If all I can do is render a textured quad I'd rather just pass.
Update. This is worse than billed. Got obsessed with the numbers. Without all the units and level fluff (ie jus the level mesh only) it's a tenth that number of verts.

We were rendering more than this, faster than this, on the leapfrog didj. A kids toy costing 40 bucks

Hmm... If you're suggesting that you're having trouble with only 2k verts, then something is definitely screwing up royally. I know for a fact I can render ~10k triangles lit, depth-tested and textured @20 FPS, even on 1st gen devices, and honestly, I am not even knowledgeable about the finer points of OpenGL performance. iPad should have no problem doing what you're describing.

How about glFlush? You aren't calling that are you? Also, you're not drawing a Cocoa view (including anything Cocoa like buttons, etc) over your OpenGL view either are you? Are you doing any GL reads?

With any luck, arekkusu or frogblast will stumble along here and have some expert advice for you.

You mention blending is disabled, but what about alpha testing? I haven't experienced this myself, but from what I hear enabling alpha testing kills performance on the GPUs used in the iPhone and iPad.

There isn't any alpha testing. You need to do it with the clip instruction in your shader aiui. We never got that far and have made sure there is no alphatesting stuff in the level. The call that usually would set the fields is a stub function.

You said you have no CPU usage, and that eliminating all rendering gets you 60Hz, so you're probably looking at some kind of GPU bottleneck. You said you've only got 8 draw calls? In that case I'd try isolating each to see if any in particular is the culprit. If one is, post the shader and details of the state (texture formats & sizes, etc).

I have that citadel demo installed. Every now and again when I'm getting pissed off, I fire that up and it pushes me over the edge and I get to stop for a bit. Before this week I considered myself good at this sort of thing. :s However, looking at other stuff doesn't help. Maybe it's using the FFP which is faster for example. Maybe not, but we just don't know and that's kinda the point.

We're using that tool from PowerVR to cycle count our shaders. The most complex one is now 9 cycles (iirc) reading 3 textures (for now down to 64x64). Lerp one to the other based on an alpha then mul the result with a third and the vertex colour. I'm away from the source now else I'd post it. (I can't get the iPad stuff remotely as someone forgot to set the eternal rights. Not a good day, lol.)

All the textures are 4 bit pvr with mips and no translucency.

When we did what you suggest with uber simple shaders we maxed out at 40 fps, which is the fastest we've ever seen anything get to. (That's actually outputting)