I have been trying to perform some OpenGL ES performance optimizations in an attempt to boost the number of triangles per second that I'm able to render in my iPhone application, but I've hit a brick wall. I've tried converting my OpenGL ES data types from fixed to floating point (per Apple's recommendation), interleaving my vertex buffer objects, and minimizing changes in drawing state, but none of these changes have made a difference in rendering speed. No matter what, I can't seem to push my application above 320,000 triangles / s on an iPhone 3G running the 3.0 OS. According to this benchmark, I should be able to hit 687,000 triangles/s on this hardware with the smooth shading I'm using.

In my testing, when I run the OpenGL ES performance tool in Instruments against the running device, I'm seeing the statistic "Tiler Utilization" reaching nearly 100% when rendering my benchmark, yet the "Renderer Utilization" is only getting to about 30%. This may be providing a clue as to what the bottleneck is in the display process, but I don't know what these values mean, and I've not found any documentation on them. Does someone have a good description of what this and the other statistics in the iPhone OpenGL ES instrument stand for? I know that the PowerVR MBX Lite in the iPhone 3G is a tile-based deferred renderer, but I'm not sure what the difference would be between the Renderer and Tiler in that architecture.

If it helps in any way, the (BSD-licensed) source code to this application is available if you want to download and test it yourself. In the current configuration, it starts a little benchmark every time you load a new molecular structure and outputs the triangles / s to the console.

how big are your triangles? I think that one of these stats is more about # of pixels, and the other about # of triangles. do the relative utilization numbers change if you zoom way out so the screen is less filled?
–
David MaymudesAug 17 '09 at 12:59

1

It varies, depending on the model I load, but they tend to be pretty small. No matter the zoom level on the model, the numbers seem to remain the same. Also, I've tried reducing the OpenGL view size to half of what it is now, to no effect, which seemed to rule out a fill rate limitation.
–
Brad Larson♦Aug 17 '09 at 14:24

1 Answer
1

The Tiler Utilization and Renderer Utilization percentages measure the duty cycle of the vertex and fragment processing hardware, respectively. On the MBX, Tiler Utilization typically scales with the amount of vertex data being sent to the GPU (in terms of both the number of vertices and the size of the attributes sent per-vertex), and Fragment Utilization generally increases with overdraw and texture sampling.

In your case, the best thing would be to reduce the size of each vertex you’re sending. For starters, I’d try binning your atoms and bonds by color, and sending each of these bins using a constant color instead of an array. I’d also suggest investigating if shorts are suitable for your positions and normals, given appropriate scaling. You might also have to bin by position in this case, if shorts scaled to provide sufficient precision aren’t covering the range you need. These sorts of techniques might require additional draw calls, but I suspect the improvement in vertex throughput will outweigh the extra per-draw call CPU overhead.

Note that it’s generally beneficial (on MBX and elsewhere) to ensure that each vertex attribute begins on a 32-bit boundary, which implies that you should pad your positions and normals out to 4 components if you switch them to shorts. The peculiarities of the MBX platform also make it such that you want to actually include the W component of the position in the call to glVertexPointer in this case.

You might also consider pursuing alternate lighting methods like DOT3 for your polygon data, particularly the spheres, but this requires special care to make sure that you aren’t making your rendering fragment-bound, or inadvertently sending more vertex data than before.

This is consistent with my experience since I first asked the question. I reduced my vertex and normal data to shorts, padding out with the additional component to get to the 32-bit boundary, and that boosted rendering performance by 30%. Culling backfaces and tweaking the depth test added another 10%. I'll take a look at color-binning to see what effect that has. Thanks for the detailed response.
–
Brad Larson♦Aug 22 '09 at 18:11

1

I have a dumb question.....what does the verb "bin" mean as used throughout Pivot's answer here?
–
Andrew GarrisonMar 9 '10 at 3:29

3

@Andrew Garrison: In this case, it means to group items of a certain type. When he says "binning your atoms and bonds by color", what he means is that I should group the vertices for items that all share the same color, then submit only those items once for that color. This avoids having to specify a color for each vertex, reducing the size of the geometry being sent.
–
Brad Larson♦Mar 30 '10 at 1:23

2

As another follow-on to this, I finally got around to implementing color binning for my molecular rendering. It led to a ~20% reduction in the size of geometry being passed to the GPU, and I observed a 17% speedup in the rendering of each OpenGL ES frame. I'd call that a win.
–
Brad Larson♦Apr 15 '11 at 4:44

Well answered, but I want to note that binning by colour should be used carefully. It will increase the number of draw calls. The iPhone will overload its CPU when more than roughly 200 drawcalls are made.
–
BramMay 27 '12 at 6:16