I'm far far from the most knowledgeable about GL on these boards, but you mention reducing your textures down to a small size. How many times are you swapping textures in one pass? Using texture atlases and reducing the number of swaps vastly improved my performance.

There are only 11 binds per frame atm as the level mesh is already as optimal as its going to get. Sure enough the tanks and level decoration stuff would benefit from better atlasing (as would the pc version for that matter) and it's something we'll get around to when that's our biggest problem. It's moot atm though.

You're right about the d3d optimisation not being an iPad optimisation, but I'm assuming that its better than nothing given we don't have details about an iPad optimisation. (Meaning the D3D version is probably better than the other option - random).

Is there somewhere I can get proper hardware details about what is actually in the iPad? PT cache, texture cache, bus latency, etc?

Might give strips ago, but I thought they were no brainers for many a year now.

I wouldn't use actual triangle strips, if that's what you mean, because of the explosion in GL calls; just stick with glDrawElements, using single batches of verts as usual, but order the verts in strip fashion. The PVR docs illustrate why this might help (sorry I don't have a link handy). Like I mentioned earlier, there is a good stripification class in the oolong engine which can be used for this if you want to try it out.

That said, I think you need to find the big culprit first, and it seems we're all circling around that most likely being one of your shaders.

Right, I see what you mean. I'll check out the oolong thing some time soon. I totally agree this won't be the killer and in all likelihood it's something to do with shading. Pretty sure its not the shader length itself or texture crapness, so what else might it be?

I seem to recall that "dependent texture read" is a pretty broad gotcha. So broad though that I don't really know what the gotchas are yet. It implies reading from a texture using UV generated by a previous read, but I know its worse than that as I saw it once somewhere. But still, outputting just white didn't go >40

Another quickie. Am I missing anything in relation to glHint ? We're not using it atm as ime it never does owt, but maybe some options are available here?

Funnily enough, we're doing that right now. Coming to the conclusion that shaders don't give us what we were looking for, we're going to switch to FFP and at least encompass all devices. (Plus get code working for stuff like android into the bargain, if that's not a swear word around here).

Outside of our engine completely would be hard but not beyond the realms of possibility. I don't think the problem lies in other code areas in our engine though as it works on a few supposedly lowlier platforms at a fair clip. (We have a 90% implemented ES1 engine already, just not targetted at iPad yet)

Are you rendering at the native iPad resolution, or 320x480? In the back of my mind is the thought that what we think of as being possible on the device -- and Epic Citadel is the benchmark I guess -- might be only be possible at 320x480.

I just downloaded Citadel for my iPhone 4 and... it's running at the Retina resolution. I assume the iPad version probably does as well. :|

I don't understand how this Citadel demo does what it does. I double-checked the install size, just to convince myself that they hadn't simply pre-rendered every conceivable frame and let you play it like an interactive video. Sadly, it appears to be real.

I couldn't even get a simple ambient+diffuse+specular shader to run at 30fps at the Retina resolution, even with all the optimizations I could think of. So I set contentScaleFactor to 1.0f and moved on. But Citadel shows it can easily do that, plus more.

Maybe they're using low bits-per-pixel or something. I guess I need to go back and have another look at it. :|

I was just thinking of doing the ES1 route as a sanity check, since it sounds like you guys could use a little of that at this point, but if you can incorporate it into your engine then all the better, since like you said, then you can reach back to older devices anyway. Unfortunately, the shader route should always be faster than FFP on iOS devices, but at this point it doesn't sound like you have anything to lose. Personally, if it were me, I'd start a test project completely on its own with test geometry and shader route to try to nail down the fast path, then re-incorporate that back into the engine once I had found it.

BTW, no, Android is not a swear word here. We focus on Apple technology discussions in these forums, but we're not zealots. Indeed, I myself am making great effort to develop cross-platform these days (including Win/Linux/etc). I even have the Android SDK installed on my dev machine. I poked around with it and did some research and decided there are just too many devices to bother supporting in a rather chaotic market where there appears to be only 1/10th the revenue of iPhone for developers, and shelved the idea for now. If somebody could convince me that it's worth it, I'd be happy to pull it out of storage.

Good advice. We do have a really simple test app we can frig about with, so once we have the engine code working we can pull out the bits and test it in isolation there. Not that I'm expecting any great revelations at this point. Me=defeated, lol