Cas' sprites thread has me in an optimization frenzy with my own code now.

Currently, for my VBOs, I take each triangle from a mesh and systematically dump each vertex along with each color, normal, and texture coordinate into 4 respective arrays (vertex, color, normal, and texture). When updating for animated meshes, I loop through the whole list of vertices (including the duplicates) and update them to their new position (same thing with the other arrays if something changes). Then, for drawing, I bind each one and call glDrawArrays.

Now, I know for a fact that my updates would benefit from not having to loop through so many duplicates for the vertices. However, the texture coordinates are making this difficult. Each vertex can (and often does) have multiple texture coordinates depending on which face is calling it. How do you handle this in an indexed VBO?

AFAIK each indexed vertex will have a complete set of attributes for that vertex... if you have two vertices which happen to have the same position / normal / whatever, but some different texture coordinates... then they are not the same vertex.

It's recommended you interleave your vertex data in a single buffer aligned to 32 bytes with 32 or 64 byte strides. That way you're not pulling data from 5 different locations for each vertex; it all streams from one spot. My sprite engine uses just a single VBO (I don't even have an index buffer, as I use glDrawArrays calls).

Also - your code in the Sprites! thread uploads data to the VBOs via Java native byte buffers. This is probably the worst way to do it in Java, as it involves manipulating data in system RAM. You don't need to ever touch system RAM - write your data directly to a mapped VBO byte buffer instead (wrapped in FloatBuffers and IntBuffers, naturally). This means data goes straight from your class members and calculations to the card; it doesn't go to system RAM first and then get copied up to the VBO.

And yes you're stuck with duplicating vertices but that's ok because a vertex is only 32 bytes, nothing to be scared of

Hmmm.... haven't done that before (writing directly to a mapped VBO byte buffer). Not sure how to pull that off with animated meshes where I have to interpolate each vertex location per frame. That's why I keep a local copy Floatbuffer, so I can update the vertices and then dump it back into the VBO (I have to have two local copies actually, so I know the last frame position vs the next frame position, and can then interpolate over time). Is there a way around that? Do you have an example?

Well, in which case you are doubly certain not to be using your own system RAM float buffers. Store your data normally in the heap - position A and position B - in float arrays. Each update, interpolate each coordinate between A and B and write the interpolated one to the mapped VBO.

There is also, however, an extension that can do this for you. And you can do something even more cleverer with shaders.

I'm actually trying to keep my engine out of shaders these days (I'm already using shaders for optional shadow mapping, bump mapping, and per pixel lighting). I want to keep animation shader-free so I can support older graphic hardware.

Also, I did just try switching my engine over to using glMapBufferARB, but I'm having a rough time trying to write the data correctly to the mapped bytebuffer (I'm using bytebuffername.putFloat(), but keep getting garbage back). Must be doing something wrong still.

Anyway, what extension are you talking about??? I love learning new stuff and optimizing.

As for "putFloat()", be sure that you are using direct buffers with native byte order. Also, don't ever read back data from a mapped buffer, only use it to send data to the GPU. If you need the data, then store it somewhere else first.

Make sure that buffer has order(ByteOrder.nativeOrder()) called on it first I cannot fathom what feckwit designer decided that native bytebuffers would be created by default in big endian order instead of native order. I spent 5 hours staring at a black screen last night until I suddenly twigged.

I forget the name of the extension that did the animation interpolation... ARB_vertex_blend or something. I think it never really caught on because vertex shaders came along.

Ok, this is strange. I finally got it setup (yes, the byteorder was what was messing it up, thanks). I'm using direct buffers and all, but it's actually running a bit slower than it was for me to go through system memory. I've timed the different command calls, and the one that's hurting the most is actually: ARBVertexBufferObject.glUnmapBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB). If I take that out, it runs a little faster than what I had before... of course, that would be sloppy of me now, wouldn't it?

That is actually in the update routine, not the drawing. I haven't changed the draw routine at all.

Quote

You cannot actually draw anything using a mapped buffer. You must unmap it before calling glDrawArrays etc. or it's not supposed to work at all!

So I tried it... If I get rid of the unmap call, it matches speed with my previous method (and still draws even though I haven't unmapped).

It also doesn't seem to matter how I allocate the bytebuffer. Once I call glMapBufferARB the first time, the function apparently builds it's own bytebuffer with whatever the LWJGL "default" is.

I'm still looking into the ARB_VERTEX_BLEND method, but it sounds like you were right about it moving to shaders.

EDIT:Running my old method (loop through each vertice, interpolate, put() it into a floatbuffer, then glBufferDataARB the buffer into the VBO): 1170fpsNew method (open direct buffer, loop through and interpolate each vertice and putFloat() it into the VBO, then unmap buffer when done): 950fps

These were both done with the commonly available Dr. Freak MD2 model, with texturing, shadow mapping, and per-pixel lighting enabled (and bump mapping enabled, but no bump map available). A small scene of cubes (for shadows to drop on), and some font objects to see the current FPS were included as well.

You can read this thread for an explanation of what LWJGL does when you call glMapBuffer. I would also highly recommend grabbing a nightly build and using the new glMapBuffer API (with an explicit length argument), it should be faster.

Elias4444, I think you should do testing on more complex scenes. At 900fps you're basically doing the equivalent of a microbenchmark, you cannot reliably measure differences between rendering techniques.

Cas, it's very interesting that you got such a huge performance boost simply from switching to mapped VBOs. Could you please confirm that it's reliable? For example, have you tested on different GPUs (from a different vendor mainly)?

Only got nvidias everywhere here. Won't really be able to get around to testing various configurations until beta in a few weeks time. I'll wait for get a proper release of LWJGL rather than experiment with nightlies.

I wouldn't really call it a huge performance boost, just about 2-3x more sprites.

Wow, both the great Spasi and Cas have responded to my thread... I'm honored!

So, as I understand this:

Quote

b) Don't use MapBuffer. Tbh, when I was working on Marathon I don't think I had used VBO mapping more than once. I think I dropped it after a while too. Especially for uniform buffer objects I'd use BufferSubData instead, I think it will be much faster. Mapped buffers are too sensitive to implementation details, it's quite hard to get the driver to behave and provide the expected performance. Also, doing it in Java is kinda awkward (any API that returns void pointers is bound to be).

I was already doing it the best way by using glBufferData calls over the mapbuffer method?

BTW, I discovered a few months back that I get a large performance boost by using glBufferData over glBufferSubData. Of course, this may be because I'm using smaller VBOs for the most part (I try to keep my scenes very simple).

Quote

Elias4444, I think you should do testing on more complex scenes. At 900fps you're basically doing the equivalent of a microbenchmark, you cannot reliably measure differences between rendering techniques.

True. I just haven't had time to put something more complex together. With all this optimization work I'm doing though, maybe it's time I built a little LWJGL benchmarker for different methods.

Ok, I've been running my new little Tommy Engine benchmarker on all sorts of machines for a few days now. I also updated it to use LWJGL 2.2.2.

New results on my personal machine:Running my old method (loop through each vertice, interpolate, put() it into a floatbuffer, then glBufferDataARB the buffer into the VBO): 575fpsNew method (open direct buffer, loop through and interpolate each vertice and putFloat() it into the VBO): 512fps

If I close the direct buffers each frame, I lose another 50 to 75 fps (but for some reason, I can just keep it open, and they all seem to behave themselves).

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org