glDrawElements vs. glDrawArrays - The numbers are in!

As promised in a thread probably 2 people read, I wrote a little tester app to evaluate the comparative speed of glDrawElements (wrapped in array locks) and glDrawArrays. It tests each with three array setups: packed discrete arrays, aligned (to double-word boundires) discrete arrays, and interleaved arrays. it does also does a depth buffer clear and swap about 30 times a second (a bit less in reality). Some results:

You get much higher poly counts with lighting off. My Geforce2MX crossed the 5Mpoly/sec mark! The Rage Pro iMac I mentioned got almost 200kpolys/sec, and the blue&white got 620 kpolys/sec. The peak poly rates on all of them definately came from double-word aligned discrete arrays drawn with glDrawElements, allthough all glDrawElements paths were releatively fast. In the case of the GF2MX, glDrawElements is over 4 times faster than glDrawArrays.

translation: if you can store static lighting (or quickly generate dynamic lighting) for some part of your scene, do it.

this is all with glColorPointer(4, GL_UNSIGNED_BYTE, 0, *) or glInterleavedArrays(GL_T2F_N3F_V3F, 0, *) BTW.

Calculating the normal vectors on the fly is beond the scope of this post.

your indicies should be arranged so that you're drawing triangle-strips (generalized triangle strips may be good enough, not sure) to help older cards which can't cache your entire vertex set. I found a great thesis on generating optimal tristrips (1MB, 85pg.) that went completley over my head, so I used the actc library I found instead (sortaBSD license - included in the archive).

That make sense?

Quote:And have you tried using the VAR extension? (have Apple even written the drivers for it yet?)

Do you think using Compiled Vertex Arrays (on cards that support it) would make a notable difference? Many devs on mac-opengl have talked about them. The main consideration is that the arrays are limited to 2048 vertices apiece. Not a subject I've had much like finding info on.

I'm very happy this thread exists, 'cos I'd been meaning to ask about this. I've just been playing about with rendering a mesh with 1682 polies, 901 vertices, normals for every vertex (which I calculated when I exported the 3ds file to my own format IIRC), and a 256*256 texture, and OpenGL lighting with one light (this is on a non TnL card).

This is all done on a G4 350MHz with Rage128, OS X 10.1.5.

Up till now this was done in immediate mode, where I was getting on average 70 FPS. Now I'm using glDrawElements() and getting almost 120 FPS. I think this comes out at about 175 - 180 thousand polies / sec. This is faster than on Ian's G4 400 for some reason (OSX vs OS9?). If the thread at macscene.org (linked to above) is to be believed, perhaps it's because I'm using Carbon and not GLUT.

Cool speed boost though!

This is without triangle strips, which I intend to look into next. Part of the speedup is probably also down to me previously using wrappers to get access to the vertices, indices etc... which were non-inlined, I don't know how much of a slowdown they would be.

Ian, do you know of the STRIPE algorithm for creating tri-strips? It works best on quads by triangulating them, but it decides on the optimal way to put the diagonal. Thanks for the link to ACTC.

EDIT:

I also meant to say that I had been put off trying glDrawElements, because I use STL vectors for storing all my data. I assume it works because the internal format of the data in a vector is no different from a plain array. But can I guarantee that this will always be the case?

Quote:Originally posted by Nimrod Ian, do you know of the STRIPE algorithm for creating tri-strips? It works best on quads by triangulating them, but it decides on the optimal way to put the diagonal. Thanks for the link to ACTC.

I found them, but ACTC works good enough, and is free. As I just stated, it appars that on the RagePro, which is the minimum config we plan to support, RANDOMIZING the polygons may produce the best results.

edit: BTW, I would be interested on what numbers these apps get on a Radeon rig, I have yet to find one I can test on.

OK, I re-ran the tests after ramdomly shuffling the triangles in the input. As expected, the Rage128 and GeForce2 got lower poly rates for the unoptimized geometry. Supprizingly, the Rage Pro got higher poly rates with the unoptimized geometry. At the peak it hit 207kpolys/sec - with tri strip optimized geometry its max was 189kpolys/sec.

tip: Check if you're running on a Rage Pro (glGetString(GL_RENDERER)=="Rage Pro OpenGL Engine"). If you are, shuffle the order of the polygons in your models around a bit.

That's really strange that randomizing the triangles order would be consistently better.. maybe you've got something else going on (if it were Radeon/GF3, I'd say Z-buffer , but that doesn't seem to make sense for Rage Pro).

-----

CVA is supported on all hardware under OSX.

It doesn't provide significant benefits unless you can interleave your CPU work with your glDrawElements calls sufficiently. Basically, it seems like VAR/Fence is at least as good in the worst case, and significantly better in the best case.

Quote:I also meant to say that I had been put off trying glDrawElements, because I use STL vectors for storing all my data. I assume it works because the internal format of the data in a vector is no different from a plain array. But can I guarantee that this will always be the case?

STL vectors guarantee that they can be passed to functions that expect c style arrays so you will be fine. On the other hand you should take care using them because they can allocate and de-allocate at in-opportune times and when they do it's not cheap. You can avoid this if you take care.

On OSX you are limited to 2048 indices. Any more than this and openGL reverts to the non compiled path.

CVA is only going to benefit you if you are touching the same geometry multiple times. eg doing multi passes for lightmapping or you have some objects that some how share exactly the same vertices. You should be calling glDrawElements lots between your lock calls. This...