Trouble with VBOs

I'm transitioning from display lists to VBOs, since that seems to be the thing to do for good performance.

I've got immediate mode vertex arrays working fine -- and performance is pretty good actually -- A scene with 5000 instances of 5 different models renders reasonably quickly. So the obvious next step is to go to VBOs so I don't have to stream the geometry every frame.

Now, this being said, my code that takes the VBO path simply doesn't render anything -- none of the models get drawn. Since code speaks louder than words I'll post my code below and hope that somebody can help.

A quick note: I'm not using interleaved arrays, since I'm supporting variable numbers of texture coordinates per vertex. Also, since I want crisp per vertex ( per triangle ) normals I'm using drawArrays not drawElements.

First, here's the drawing function. If I set USE_VBO to false it draws in immediate mode and the rendering's correct.

The code that actually generates the data in _vboPoints, _vboNormals and _vboTextureCoordinates isn't important here, since I know that it's generating valid data thanks to the correct rendering in immediate mode.

I think your problems is you're enabling the client state only for the regular vertex arrays, but not for the VBOs. You still need to enable the client state, so it will know which arrays are being used. (even though the data is no longer being hosted by the client, it shares the same interface)

akb825 Wrote:I think your problems is you're enabling the client state only for the regular vertex arrays, but not for the VBOs. You still need to enable the client state, so it will know which arrays are being used. (even though the data is no longer being hosted by the client, it shares the same interface)

Actually, fuzting around last night, that turned out to be it. I'd never seen calls to glEnableClientState in VBO sample code so I'd assumed it was something specific to vertex arrays.

To OSC:
I'd considered using interleaved arrays here so I could store just one contiguous array, and I may in the future. The thing is -- at least right now -- I'm supporting an arbitrary number of texture coordinate arrays. Models loaded from .obj files may have, for example, a set of texture coordinates, but my code will auto matically generate tex coords suitable for detail textures as well. So I'm supporting adding sets of texture coords as desired, which means a model may have one, two, three or however many sets as desired. I know I can make a specialized vertex struct which supports, say, a max of four texture coordinates and I can use the stride component to pad it out, but I don't know if the interleaving constants will support the formats I want. For example, I've used GL_T2F_V3f, but I don't know if GL has GL_T2F_T2F_N3F_V3F, or GL_T2F_T2F_T2F_N3F_V3F... I could search for it in the headers, and find out of course. And then, what happens when I want to add an attribute array for vertex shaders?

Are interleaved arrays that much faster? I'm getting right now ( using VBOs ) 60 fps on a scene with 5000 models ( of which, the scene graph is culling a little more than half owing to view frustum and far plane culling ).

The suggestion isn't to interleave the array, but to stack the arrays on top of each other. You would make the buffer large enough to accommodate all your arrays with glBufferData, passing in NULL so you don't load any data the first call, then load them in whatever order you want with glBufferSubData. For example, you could first load in the vertices, then at an offset of the size of the vertex array, load in the normals, then at an offset of the size of both the vertex array and normals, load in the vertex coordinates, etc. Then, when you call glVertexPointer, you would pass in (void *)0 for the pointer, then (void *)sizeOfVertexArray for the normal pointer, (void *)(sizeOfVertexArray + sizeOfNormalArray) for the texture coordinate pointer, etc.

That's one possibility, akb, but interleaving the arrays *was* what I had in mind

TomorrowPlusX: there's no need for a specific interleaved array format, or a call to glInterleavedArrays, in order to interleave the data. The "stride" parameter to gl*Pointer allows for interleaved arrays quite happily. I believe that the Apple implementation of glInterleavedArrays simply calls gl*Pointer and glEnableClientState a few times with appropriate strides.

Is there any advantage to interleaving arrays? One disadvantage I can see is it makes it harder to update chunks of data for, say, vertices and normals, without needing to update colors and texture coordinates.

I understand it's better for the GPU's caches for them to be interleaved, but I've never personally observed any benefit. I certainly wouldn't be worried about not interleaving them if that's what made sense for my application.

Well, I'm glad to hear I can just use stride and ignore the interleaving constants, since that seemed brittle. I'll likely give it a shot at some point, but right now performance seems pretty tight, and I've got bigger fish to fry -- I've finally got proper additive lighting / stencil shadowing for multiple light sources working, and am working on a system for shadow visibility determination so I can minimize number of shadow volumes to extrude.

Are there any good sources out there for implementing stencil shadow volume extrusion on the GPU? I've written a decent enough CPU implementation and am streaming the geometry through normal vertex arrays, but I assume doing it on the GPU would haul ass... haven't you done this, akb825?

TomorrowPlusX Wrote:Are there any good sources out there for implementing stencil shadow volume extrusion on the GPU? I've written a decent enough CPU implementation and am streaming the geometry through normal vertex arrays, but I assume doing it on the GPU would haul ass... haven't you done this, akb825?

The trick is to have 2 extra meshes: one with normals specified based on the face rather than the vertex, and one with a 0-width quad for each edge in the model (with each side of the 0-width quad having the normal of the adjoining side so that if it gets stretched out the winding will be correct), then draw both those extra meshes with a vertex shader that extends the vertex if the normal is facing away from the light. (or the dot product is negative) If you apply that to the z-fail algorithm, you'll get very good results. There's a little bit about it at the bottom of this page.

That said, I'm moving away from stencil shadows. Though with a CPU solution it's possible to run on older GPUs, shadow mapping gives a much cleaner and exact solution, plus it's much easier to implement soft shadows. If you implement a post-perspective projection for drawing from the light's point of view, you can get rid of the precision problems, too.

Yeah, you know I'm considering going to shadow mapping as well. I'm hitting some serious fill rate issues with my shadow volume rendering. I've written a stress test where I have about 6000 models, all of which are shadow casters. While I'm obviously not drawing all of them, I'm drawing a few hundred per frame, and while I get 60 fps without shadows, I get about 20 with. I know from profiling that actually drawing my scene only takes about 3% of my CPU time, so it seems that shadow textures would be ideal.

I'm not really certain how to best approach shadow textures to maximise precision, though, since all the demos I've seen take very naive approaches to how the light's viewpoint is set.

Well, for the time being I'm sticking with stencil shadows -- mainly because I'm doing technical style rendering here anyway.

But, that being said, I read up on the GPU extrusion method. The method you describe is different from the method described in the gamedev.net article ( which actually was my reference a couple years back when I wrote my first CPU implementation ).

What I'm not 100% certain about is how the facedness is stored in the collapsed quads. Also, you describe using 2 meshes, where the gamedev article describes using just one.

Any pointers? I googled for the last 20 minutes and found surprisingly little. I might have to delve into the OGRE sources, since that's the only "real" lead I found.

The reason why you need 2 meshes is because for z-fail, you need to have a cap on the ends. If you use the original mesh, then the normals are smoothed and the cap won't fit up correctly.

To create 0-width polygon mesh, what I did was go through all the polygons and for each edge, add the 2 vertices with the flat normals. If the 2 vertices were already there, you add 2 more vertices to the ones already there to form your quad.

After I wrote my message I had a smoke and thought about the problem and realized I'd need two meshes. I don't understand how the gamedev article gets way with one...

I'm going to probably pester you to help me in a few days with this. Right now I'm writing simple per-pixel shaders for my rendering to "take a break" from the shadows for a bit.

I'm looking forward to GPU extrusion since it ought to be a hell of a lot faster. Storing a VBO for each extrusion ( for situations where the local light position is static ) is too VRAM intensive, and streaming vertices is too slow.