Hello guysI'm new around here and just come up with my first question. I started to work with Lwjgl to make a Minecraft like Voxel engine. After I implemented the movement and the mouse controls I wanted to implement the Blocks. I decided to use the Slick Texture class because it's the easiest way. So I wrote the code to bind a texture for the top, the sides and the bottom of a block. When I created a test block it worked fine, also with a square of 10x10 blocks.But when I wanted to render a whole chunk of blocks (16x16x100) it nearly froze. First I thought it was because of looping through the array but after trying out some other ways to store the blocks I recognized it wasn't. I then removed the calls to bind the textures, and it ran perfectly fluent.

My question now is, why does this happen, and how could I add textures to the different sides of the block without running into such performance issues?

If you were to call glBindTexture 3 times per block, 10x10 blocks would lead to 300 texture binds per frame. Instead you should use texture atlases (like Minecraft does), that way you are only binding one or two textures per frame.

Using immediate mode and glBegin/glEnd will give you plenty of problems when you're rendering that many blocks. Firstly, you shouldn't use a different glBegin/glEnd for each face; but instead put them within the same block. Secondly, you should be using something like vertex arrays or VBOs to "batch" your vertex data.

There are a lot of obvious improvements that should be done for any Minecraft clone. One important optimization is culling any faces that touch each other, and thus don't need to be rendered.

You should learn OpenGL before trying to make a 3D Minecraft clone. I'd recommend something simple, like a 2D game, and write your own texture loaders and font renderers instead of relying on Slick. If you really want to make a 3D game, then maybe Ardor3D, jMonkeyEngine, or LibGDX would be better for you.

You're calling glTexParameteri() BEFORE binding the texture, affecting the currently bound texture and potentially changing the filtering mode of some other texture. With that out of the way...

How many blocks are 16x16x100? 25 600 blocks. With 6 sides per block, 4 vertices per side, you have 614 400 vertices and 153 600 triangles. That's not really a problem (for now). The problem is that you're drawing them with glBegin()-glEnd(). You're doing several million OpenGL calls for this single chunk, which is frankly ridiculous. Catastrophe is inevitable when you add texture binds between every 4 vertices to that.

You need to do 5 things to get this running even close to Minecraft's performance.

1.You need to get rid of the texture binds. Since you're using GL_NEAREST filtering, this is easy: Just put all the block textures into a single texture and pick out parts of it using texture coordinates for each block. That way you can just bind the texture once per frame when when you rendering blocks. Texture size isn't a realistic limit since they can be like 4096*4096 on low-end hardware.

2.You need to batch your draw commands together. glBegin()-glEnd() is so old and stupidly slow that your CPU can't possibly feed data as fast as your GPU can render the blocks. The simplest way is to use display lists. Simply wrap the stuff you do now (minus the texture binds and with fitting texture coordinates according to point 1) in a display list that you generate (and regenerate) when a blokc in the chunk changes and just call glCallList() to render the whole chunk in a single draw call. This'll move the bottleneck to the GPU since the display list will store its data on the GPU and you won't be doing millions of OpenGL commands anymore, just a few.

3.You're drawing 614 400 vertices per 16x16x100 chunk. You won't be able to draw more than one or two of those before hitting both a vertex bottleneck and a hideous overdraw bottleneck when looking down. Just think about it. Looking down through 50 layers of dirt will force your GPU to fill pixels over your whole screen 50 times. A 1080p screen is around 2 million pixels. Yeah, 100 million pixels for just a single visible layer of dirt. You need to get rid of quads that can't possibly be seen without digging (which causes the display list to be regenerated). Have you ever noticed how in Minecraft it's possible to sometimes see caves and lava far away when the chunk updating bugs out and leaves holes in the ground? That's because it's only drawing surfaces bordering to air and transparent chunks. A dirt block surrounded by 6 solid blocks cannot possibly be seen by the player unless they hack/noclip into the terrain, so why draw it? Basically when generating your display list, you want to check each block against its neighbors and only draw the quads that do not border to a solid block. By doing this, you'll be able to get rid of around 95-99% of all your geometry.

4. Another way of reducing overdraw is to enable face culling. The idea is that since all your geometry is solid, it's impossible to see from inside them (again, unless you hack). In a cube, you'll have 6 surfaces, but the back surfaces (surfaces facing away from the camera) won't need to be drawn since they'll always be covered by the front surfaces facing you. It's not realistic to do this culling on the CPU since your view-point is constantly changing, but your GPU can do this extremely efficiently per triangle, which pretty much halves the number of pixels your GPU has to fill. Just use glEnable(GL_CULL_FACE) to enable it. Note that this feature uses the winding of the vertices to determine which way they are "facing". You might have to change the winding setting with glFrontFace(GL_CW or GL_CCW) to fit your code or adapt your geometry generating code. And finally, remember to disable it after you're done! Surfaces randomly disappearing isn't fun.

5. An infinite world requires an infinite number of chunks, so you can't draw all of them. Even if you have a finite size on your world, drawing all chunks is extremely wasteful since only a small fraction of them will end up on the screen. What you need to do is frustum culling. You need to (on the CPU) determine if a chunk is visible to the camera and only draw the chunks that are visible. The player usually only has a 90-120 degree field of view, so you can expect at least a third of all chunks to be out of view, potentially a lot more if you have a short view range.

Well thank you guys for your quick and long answers Let's answer this step by step:

I didn't know that those texture atlases are that more efficient because I didn't know how they work. I just thought all textures are one image so that it's easier to load and that the image is cut into pieces afterwards. But the way this should work really makes sense. Thank you for this one!

I wanted to use only one pair of glBegin and glEnd but the binding of the textures doesn't work in between them so I had to split them up. I just started to learn OpenGL so I don't know how to use every technique possible, but I'm willing to learn!

Now to theagentd. Thank you for your VERY long answer

To 2:I read about the Display Lists just yesterday and when I wanted to use them I thought it will give me a problem because I render every block at a different position. I don't know if I could use variables in Display Lists. Otherwise I'd have to create a new Display List for every block or translate before every call. I'm not sure if this is more efficient

To 3:I already thought about this, but because I'm not sure how to efficiently store and acces the blocks I wasn't really sure how to implement this quick. The idea with the Display List seems pretty good. How often do I have to create a new Display List then? If I want the world to be uneditable once would be sufficient but otherwise it's not. But if I only create a new List when I add or remove a block this would happen irregular and may make the game run slower for short.

Also: would it be a good idea to remember all visible blocks when creating the display list in an extra array to use this decreased amount for collision detection? I have no idea how to efficiently detect what blocks may be obstacles without using such a thing as an octree, which seems to be a little too much for the beginning. (I might add this later on, but now I want to concentrate on the basics)

To 4:I read about this, but when I wrote the code, I didn't remember Good idea!

To 5:I'm not sure if I want the world to be infinite, but I know that I have to detect which chunks I should render

Thank you very much again for both answers. They already helped me a lot!

2) Yes, you have to recreate the geometry of a chunk every time a block changes in it (or a neighboring chunk). The point is that this doesn't happen for all chunks at the same time. Having to recreate a single chunk each frame isn't the same as recreating them all, and the player probably won't notice the spike. If you think that it's a problem anyway, the best solution is to reduce the size of the chunks. I'd say it's better to go with cube chunks instead. If a single block changes, there's less data to invalidate. They're also easier to frustum cull (can be approximated with a fast sphere test), and will allow you to set the height of the world easier (not locked to 100). Regenerating a 16x16x16 chunk is really fast, so you can regenerate a few of them without a dent in the FPS per second. Try to limit the number of updates to a certain number per frame though. You don't want a 100ms delay suddenly. Better to split it up over a few frames.

3) It doesn't have to be too quick since it shouldn't be done very often. Besides, a slight bump in generation time vs 5 FPS and out of memory errors? This isn't a difficult choice to make. =S

5) It's a good idea to use a hierarchical culler. That'll allow you to reject blocks very quickly.

No it's not a wannabe^^ And I know that there are incredibly many people out there trying to copy Minecraft. I'm just interested in the engine and I like the way the worlds look. Also this shouldn't be a real project, it's just to learn some OpenGl and Minecraft offers a good point to start, because you don't need many complex geometries. But your comment is eligible

Only OpenGL calls will be STORED. All the commands you submit will be stored, no matter how you make them. If you loop between begin and end, it'll just store each OpenGL as you make it (it doesn't care about the loop, only the commands).

No it's not a wannabe^^ And I know that there are incredibly many people out there trying to copy Minecraft. I'm just interested in the engine and I like the way the worlds look. Also this shouldn't be a real project, it's just to learn some OpenGl and Minecraft offers a good point to start, because you don't need many complex geometries. But your comment is eligible

No problem, i have been trying the same thing, it teaches a lot about opengl.

Some simple optimalisations:The important part is to render all blocks and chunks from lets say 16x16x16 or something.When the player is looking up, you can ignore all chunks underneath the player, same with up / left and right.

When drawing blocks in an loop (each chunk its own list, because blocks can be removed, and re rendering an world would lag like ***), look first if there is an block ajacent to the face you are trying to draw, if there is an block in the way, there is no need to draw.this way you are only drawing the visible faces.

Example:and 10x10x10 grid has 1000 blocks with in total 6000 faces.Lets assume its only dirt (no transparent blocks between, so its an solid grid).if you only draw the outer faces, you only need to draw 60 faces (10 each side), other faces are not visible.There are only 3 or 4 sides visible for the player, no matter what position you have, so you can cut this in half again.Now instead of 6000 faces, there are only 30 beinig drawn.

With this you are able to render an huge world without problem, i got it working for something around an million of blocks with 200+ fps.

Some simple optimalisations:The important part is to render all blocks and chunks from lets say 16x16x16 or something.When the player is looking up, you can ignore all chunks underneath the player, same with up / left and right.

Just do some decent* frustum culling and everything outside the view frustum will be culled. Cube chunks can be approximated with a sphere which is much faster to test than a cube (6 points + some special rules).

* A hierarchical approach is the best.

I noticed that there were several problems with 16^3 chunks. The chunks are a bit too big, so they don't compress well. I compress chunks that only contain one kind of block into just an int instead of an int[] which contains each block's ID. This had a pretty big impact on memory usage. Another problem is that 16^3 chunks are sometimes too small or too large when rendering them. You want to draw as many triangles with as few draw calls as possible, but you also want to have as fine culling as possible to reduce the number of triangles. You have to find a balance between these two.

- 8^3: Good for compression (memory usage), good for rendering chunks with lots of geometry, horrible since we get way too many draw calls and expensive culling. - 16^3: Okay compression, better balance for rendering geometry but you may get more than 65536 vertices which forces you to split it into two draw calls (VBO stuff, not relevant to display lists), much fewer draw calls but still a bottleneck. - 32^3: Really bad compression, good for rendering 90% of all geometry but will almost definitely overflow even in general cases, very few draw calls and fast culling.

We want the compression, the cull precision (where we need it) and the overflow-proof VBOs of 8^3 chunks, and the number of draw calls and culling efficiency of 32^3 chunks. The solution for me was to store the voxel data in 8x8x8 chunks, but merge the geometry data of chunks with very little data to achieve a good vertex-per-draw-call ratio. With that we get the compression of 8x8x8, while only culling data until it's small enough to just render. Since most chunks will only contain a very small amount of actual geometry, I merge 8 of them up to a 64x64x64 render chunk as long as they have less than a "maximum" number of vertices in total. I found that a good ratio was to try to keep the number of vertices per draw call to around 1500-2000, but this ratio depends on your CPU and your GPU.

This is a 2048x512x2048 world (it's 8GBs uncompressed, fits in 1000MB RAM after compression) with a view distance of 400 blocks. Dammit, print screen causes a lag spike! I was getting 510 FPS! =S

All this sounds very good, but also very complicating.I think I will try the 16^3 solution for the beginning, because I still have to deal with topics like physics, selecting a block in space, etc... After that I will probably care more about efficient chunks and more optimizations.

At the moment I'm stuck by choosing a good way to store the chunk data. I think I'm gonna use a 3 dimensional Array, because like that, I could easy determine the neighboring blocks. On the other hand, I have to loop through this with 3 loops at once, which seems to be pretty inefficient. Do you have any tips how I can handle this without losing the possibility to look up the neighbors quick?

At the moment I'm stuck by choosing a good way to store the chunk data. I think I'm gonna use a 3 dimensional Array, because like that, I could easy determine the neighboring blocks. On the other hand, I have to loop through this with 3 loops at once, which seems to be pretty inefficient. Do you have any tips how I can handle this without losing the possibility to look up the neighbors quick?

Don't do that. Just use a one-dimensional array and calculate the index from a 3D coordinate. The CPU overhead of doing it is unnoticeable, but the memory gain is extremely big. A single array instance takes something like 12 + 4*length bytes, and you'll have one 3D array, 16 2D arrays and 256 1D arrays, all of length 16. That's an overhead of 20 748 bytes per chunk vs the 16*16*16*4 = 16 384 bytes of actual data. That's more than twice as much memory usage!

Plus, looking up neighbors is easy:

array[ x][y][z] is the same as array2[x*16*16 + y*16 + z];

so if you know the index of one block, you can calculate the others with just an addition:

I could have thought about this by myself I already did this with a 2D array for a past project. But I didn't knew that a more dimensional Array takes that much more memory. It's even better then!Thank you once again =)

And thank you matheus too! You've got a nice utility collection there. I probably check out more Nice blog too!

And thank you matheus too! You've got a nice utility collection there. I probably check out more Nice blog too!

Yes. The cool thing about the library is, that it does not depend on other librarys. You only need to download it, link it, and there you go. No LWJGL, no JogAmp, no Slick, no libGDX, no Math librarys, nothing. Just this lib That was the idea.Oh and, yes, thank you too

Sounds very cool. Especially because I'm not that much into this vector and matrix maths stuff (I know shame on me because it's necessary, but I think the stuff I learned in university was a bit too theoretical so I started to dislike it =()I know that I won't be able to avoid this stuff

EDIT: So I just tried to enable face culling, but the result was weird. If I render a single chunk and look at its front everything seems normal. But if I "fly" to the back, the squares that have been culled when looking from the other side aren't rendered. So I guess the viewpoint hasn't been updated for the culling

No, you most likely have different winding for the faces. Check your code again. GL_CW and GL_CCW stand for clockwise and counter clockwise. OpenGL has to determine what the "front" and "back" of a triangle is, so you need to use the same winding for all faces.

Oh yes. I just tried to play around with the normals but they are not responsible for this (as I figured out )I just thought wrong when writing my code. I copied the code for front and back and just changed the z coords

@theagentd, that's a really neat scheme you have going there. What machine did you bench it on?

Not a mainstream PC. An i7 860 @ 3.52GHz and a GTX 295 (both GPUs in AFR). The performance is almost the same no matter the size of the world since it's mostly GPU limited (around 850 000 vertices for my terrain). If I make the terrain hillier there's a lot more geometry, so performance takes a hit even though the render block sizes are reduced automatically to improve culling precision.

He's generating indices in a way that improves cache coherence when reading neighbors. A 16x16x16 chunk is too large to fit into the CPU cache, so the same data is read multiple times from RAM into the cache and then immediately discarded when the next distant (for the CPU) neighbor is needed, sometimes for each block. In my code, getting x-1 would get a neighbor that 256 slots to the left in the array. That isn't very good since there's a big chance that that part of the array has been removed from the cache, and the CPU has to wait for it to be loaded from RAM which is a lot slower than the CPU cache. Like Riven showed in the benchmark, you get around 3x performance by using a more cache friendly way of generating indices into the array. If I got his code right, he splits up each chunk into 8x8x8 "sub-chunks" and store these in the array instead. That's much better since the only time we might get a cache miss is when sampling over "sub-chunk" borders instead of for every single block.

But Riven, I think Marcus Persson was focusing on other things at the time. Those chunks were loaded from disk or downloaded from a server, so improving the CPU performance of generating the geometry probably wasn't his first priority since the gain would be so small. It also reduces the readability of the code. I mean, no one seems to get what you're doing there. I don't know exactly what it's doing there either, I just know what you're trying to do. Marcus probably just saw it as a premature optimization.

(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback )

That's pretty neat. I ran into a similar problem at work recently, concerning sparse vectors. One implementation used Colt's Map, the other was a sorted array of index/float pairs. The later outperformed the former by a factor of 30, without any big optimizations on the inserts/gets. This made a real big difference in a real-world scenario where we brought down the computation of some clustering solutions from 6 hours to 12 minutes.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org