The four situations I was testing are:- A new array and buffer is made every draw pass. All the values are copied over into the array one by one (simulating what you would be doing making draw calls from all over your code), then bulk copied to the buffer.- An existing array and buffer have already been made, so are in memory. The buffer is cleared. Then the above operation happens (basically the same as above with a clear instead of making a new buffer and array it just clears them out).- No array is used, a new buffer is made every draw pass. A bunch of individual puts happen, each one using the index value.- No array is used, a new buffer is made every draw pass. A bunch of individual puts happen, with no index value.

This seems to vary wildly based on the number of items I'm putting in. Wha?

Here's an updated test case that averages across multiple runs, and also adds a case where you do index puts into an existing buffer. Additionally, it uses direct float buffers since that's what OpenGL libraries require.

1) Try to use Caliper or at least extract each test in its own method and add some warm-up runs.2) Use a much smaller ARRAY_SIZE, something that's representative of fp data in a game. Increase the number of tests runs accordingly.3) For the buffer put loops, use buffer.limit() (or .remaining()) instead of valueArray.length. In the array loops, you're making it easy for the VM to remove array bounds checks, but you aren't doing the same in the buffer loops.4) Add tests that use random indices (instead of going from 0 to array/buffer length). Would be interesting to see the differences then.

Btw, I have developed a clone of the NIO buffer API that uses sun.misc.Unsafe, supports direct buffers only and allows disabling all bounds checks (ala org.lwjgl.util.NoChecks=true). For random access it's several times faster than normal NIO and it's just as fast for the easy cases. I'm using it with a private LWJGL build that has been modified to support it.

Btw, I have developed a clone of the NIO buffer API that uses sun.misc.Unsafe, supports direct buffers only and allows disabling all bounds checks (ala org.lwjgl.util.NoChecks=true). For random access it's several times faster than normal NIO and it's just as fast for the easy cases. I'm using it with a private LWJGL build that has been modified to support it.

Oooh, I would love to use that.

Good point on the smaller array size, I didn't get to the point of averaging runs together so I was just increasing the array size to try to get more "accurate" data. Like I said, quick and dirty (especially crap like copy pasting System.nanoTime() a bunch instead of using a for loop).

But in hindsight it's definitely best to have an accurate benchmark, or you're just wasting your time. Looks like lhkbob's results are more in line with what we would expect.

Interestingly enough, this means that just using a plain old put() is the fastest method (with this number of vertices). I'm not sure why index-based puts would be slower, but they appear to be. The difference between the bulk put and the single non-indexed pushes are pretty minor, but looks like you should definitely avoid index-based puts. Especially notable - absolutely keep your FloatBuffer in memory and then just clear() it every time you want to put new stuff in it. That is magnitudes faster.

Here's a run with only 5,000 vertices, note that the array method becomes faster in this case.

Another take away is that high-level graphics engines aren't really required to use FloatBuffers in their interfaces. They can have geometry, etc. represented as plain old arrays and then keep a cached buffer around behind the scenes when they need to talk with OpenGL. This is nice because you don't have to worry about the user screwing the direct-ness or endian-ness of the buffer anymore.

Both of these used 40000 long arrays/buffers and only 100,000 test runs because I got bored :/ I don't know why there was such a huge performance hit for individual and index puts with nonexisting buffers on Mac. Either way, using existing arrays + bulk puts is a viable option it seems on both OS's.

I have commented out the tests that recreate the buffer, since I found that the re-allocations take the majority of the time. Feel free to un-comment and test it if you want. In my tests, index-based puts were faster, followed by bulk puts, followed by individual puts. Results:

So basically you can easily get rid of the bounds checks, but it's still slower than index-based puts because you're updating the current buffer position on every put (see the package-private nextPutIndex() in java.nio.Buffer).

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org