Hello.I have in my plans to create world for my game so I start testing how vertexarray deal with meany objects andI must say no weary well but only because I'm still learning LWJGL.Now with 10k objects (cube with 2 texture) I have poor 10fps.Of course I know about VBO and List but I'm sure the result should be better.Right now my app doing something like this (it's only part of my code)

I know the biggest problem is loop with 'entities.render();' and 10k objects.So how to optimize this code?I'm guessing that solution is renderer only this what I'm seeing not all (behind and under me).So can you give me some advice?

Making 10,000 glDrawElements or glDrawArrays calls will be slow. In my experience, 10fps is not unreasonable.

To achieve 10,000 objects rendered, you need to be smart about state changes, and instancing. If you can use shaders and OpenGL 3.0 (or extensions), instancing can help a lot. Otherwise, you might have to pack your geometry into large vertex arrays/VBOs. A GPU is much better at handling a few large VBOs/VAs compared to many small VBOs/VAs.

The feasibility of packing data depends on how often you have to rebuild them, otherwise you might just bottleneck the CPU instead of the GPU and not see a faster fps.

If that becomes a problem, you can do what Total War games do and use dynamic level of detail. Instances farther away use lower detail (which is then faster to dynamically pack into a single VBO), or at the farthest distance, uses billboards where each unit is a single quad with a texture holding a rendering of their current animation.

Indeed this is a huge CPU bottleneck due to the insane amount of draw calls here. Another big problem is the number of texture binds which impacts performance a lot. To achieve good performance you need to get rid of all the draw calls you make per cube and instead only do them once for all the cubes at the same time. There are 2 ways of doing this:

One is to simply batch your cube parts together. In your case you would create a big enough ByteBuffer / FloatBuffer to hold all the data of all your cubes, in your case two buffers since you have 2 cube parts. For each cube you would "draw" its vertex data to the two buffers and then at the end just draw them all with a single call to glDrawArrays(...) for each buffer. That way you'll have a constant number of draw calls regardless of the number of cubes. This will reduce the CPU load a lot, but you'll still probably be CPU bottlenecked before you're vertex bottlenecked (assuming you don't get fragment limited).

Like Lhkbob said the ultimate way of doing this is with instancing. Instancing allows you to make a single draw call and have the same geometry replicated a number of times at an extremely low CPU cost. You can keep the cube vertex data in a VBO and then draw each cube instance at different locations using a per-instance attribute. Like I said this is ridiculously fast since the GPU is pretty much guaranteed to be the bottleneck, but it has two drawbacks. The first is that this requires OpenGL 3 support, meaning that it won't work on really old computers, computers with Intel graphics cards or OSes that don't support it. Secondly you need a very simple shader to move each cube instance to its own location or they'll all end up in the same place.

The best solution is to implement both. Use instancing when it's available and fall back to simple batching when it isn't. You should definitely start with batching if you don't have any shader experience though. I'm sure the performance of simple batching in your case will be enough. Besides, do you even need 10 000 cubes?

To "prove" that you're CPU limited: I can draw 3 500 Bobs. Bob is a dwarf. He's made of 1027 triangles, meaning that I'm drawing 3 594 500 triangles per frame. At 60 FPS. On a laptop. You have a cube made of 6 quads which is equal to 12 triangles. In theory you can draw around 3 594 500 / 12 = 299 541 cubes per frame at 60 FPS on my computer. Also note that a high-end desktop GPU is around 3-4x as fast as my laptop's GPU.

EDIT: Dammit! Turns out I was fragment limited after all! Now I'm pushing around 5 000 000 triangles per second...

But if you need to keep changing the transform matrix between each draw call, you're not allowing the GPU to operate at full efficiency. So it will be faster, but it might be 16fps instead of 10fps, which isn't much of an improvement.

But if you need to keep changing the transform matrix between each draw call, you're not allowing the GPU to operate at full efficiency. So it will be faster, but it might be 16fps instead of 10fps, which isn't much of an improvement.

Which is why you use instancing with a custom per-instance matrix attribute for the model matrix and upload a matrix per cube.

Or you can transform the boxes into world space on the cpu and send batches of triangles to the cpu. That way you can keep the cpu and gpu busy at the same time. So you would transform, upload and render for example 100 boxes at the time, where the transforming and the uploading/rendering would be in parallel.

One is to simply batch your cube parts together. In your case you would create a big enough ByteBuffer / FloatBuffer to hold all the data of all your cubes, in your case two buffers since you have 2 cube parts. For each cube you would "draw" its vertex data to the two buffers and then at the end just draw them all with a single call to glDrawArrays(...) for each buffer. That way you'll have a constant number of draw calls regardless of the number of cubes. This will reduce the CPU load a lot, but you'll still probably be CPU bottlenecked before you're vertex bottlenecked (assuming you don't get fragment limited).

Like Lhkbob said the ultimate way of doing this is with instancing. Instancing allows you to make a single draw call and have the same geometry replicated a number of times at an extremely low CPU cost. You can keep the cube vertex data in a VBO and then draw each cube instance at different locations using a per-instance attribute. Like I said this is ridiculously fast since the GPU is pretty much guaranteed to be the bottleneck, but it has two drawbacks. The first is that this requires OpenGL 3 support, meaning that it won't work on really old computers, computers with Intel graphics cards or OSes that don't support it. Secondly you need a very simple shader to move each cube instance to its own location or they'll all end up in the same place.

lhkbob- dynamic level of detail- single quad with a texture holding a rendering of their current animation.Ehhh, I saw this so many times in games so it must be good solution.

Still so many to learn.I have hope you understand cubs are only for testing I'm not trying to creating new Minecraft. When I'm toking about cube I'm thinking about trees, bushes, buildings and so on which are more complicated than one cube.

Or you can transform the boxes into world space on the cpu and send batches of triangles to the cpu. That way you can keep the cpu and gpu busy at the same time. So you would transform, upload and render for example 100 boxes at the time, where the transforming and the uploading/rendering would be in parallel.

Or just use a VBO to hold all the model data, another VBO for the per instance transformations and instancing. Both simpler and faster.

It's obviously important to know which one is the bottleneck before starting to optimize stuff. Some "solutions": - CPU: Better algorithms, instancing, MappedObjects where applicable and multithreading, or offload it to the GPU using shaders or OpenCL. - GPU vertices: Dynamic LoD, instancing, less data per vertex, less expensive vertex shaders. - GPU fragments: Offload to vertex shader, reduce overdraw, frustum and occlusion culling.

lhkbob- dynamic level of detail- single quad with a texture holding a rendering of their current animation.Ehhh, I saw this so many times in games so it must be good solution.

Still so many to learn.I have hope you understand cubs are only for testing I'm not trying to creating new Minecraft. When I'm toking about cube I'm thinking about trees, bushes, buildings and so on which are more complicated than one cube.

You're right, but I wanted to keep it simple. State changes (especially texture binds) turn out to be really expensive. I'm not entirely sure what state changes cost or actually do, but I assume that it's both a big driver (CPU) cost and also a pretty big GPU cost too, since I guess all stream processors need to be configured, e.t.c. Long story short, avoid texture binds and state changes whenever possible. A rule of thumb: State changes and draw calls should not depend on the number of instances you're drawing, only on the number of different types of objects you're drawing. If I have squares, circles and triangles, I only want to draw each type once. I setup the needed settings for squares and draw ALL squares with one draw call, either using instancing or using batching. I then do the same for circles and triangles. I only need to do state changes 3 times, and I only need 3 glDrawXxxx calls regardless of the actual number of objects I have.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org