I'm writing my own clone of Minecraft (also written in Java). It works great right now. With a viewing distance of 40 meters I can easily hit 60 FPS on my MacBook Pro 8,1. (Intel i5 + Intel HD Graphics 3000). But if I put the viewing distance on 70 meters, I only reach 15-25 FPS. In the real Minecraft, I can put the viewing disntance on far (= 256m) without a problem. So my question is what should I do to make my game better?

The optimisations I implemented:

Only keep local chunks in memory (depending on the player's viewing distance)

Frustum culling (First on the chunks, then on the blocks)

Only drawing really visible faces of the blocks

Using lists per chunk that contain the visible blocks. Chunks that become visible will add itself to this list. If they become invisible, they are automatically removed from this list. Blocks become (in)visible by building or destroying a neighbour block.

Using lists per chunk that contain the updating blocks. Same mechanism as the visible block lists.

Use nearly no new statements inside the game loop. (My game runs about 20 seconds until the Garbage Collector is invoked)

I'm using OpenGL call lists at the moment. (glNewList(), glEndList(), glCallList()) for each side of a kind of block.

Currently I'm even not using any sort of lighting system. I heard already about VBO's. But I don't know exactly what it is. However, I'll do some research about them. Will they improve performance? Before implementing VBO's, I want to try to use glCallLists() and pass a list of call lists. Instead using thousand times glCallList(). (I want to try this, because I think that the real MineCraft doesn't use VBO's. Correct?)

Are there other tricks to improve performance?

VisualVM profiling showed me this (profiling for only 33 frames, with a viewing distance of 70 meters):

Profiling with 40 meters (246 frames):

Note: I'm synchronising a lot of methods and code blocks, because I'm generating chunks in another thread. I think that acquiring a lock for an object is a performance issue when doing this much in a game loop (of course, I'm talking about the time when there is only the game loop and no new chunks are generated). Is this right?

Edit: After removing some synchronised blocks and some other little improvements. The performance is already much better. Here are my new profiling results with 70 meters:

I think it is pretty clear that selectVisibleBlocks is the issue here.

Thanks in advance!
Martijn

Update: After some extra improvements (like using for loops in stead of for each, buffering variables outside loops, etc...), I now can run viewing distance 60 pretty good.

Depending on who you ask, some people would say that Minecraft really isn't that fast, either.
–
thedaianJan 19 '12 at 20:51

I think something like "Which techniques can speed up a 3D game" should be a lot better. Think of something, but try not to use the word "best" or try to compare to some other game. We can't tell exactly what they use on some games.
–
Gustavo MacielJan 19 '12 at 21:04

6 Answers
6

You mention doing frustum culling on individual blocks — try throwing that out. Most rendering chunks should be either entirely visible or entirely invisible.

Minecraft only rebuilds a display list/vertex buffer (I don't know which it uses) when a block is modified in a given chunk, and so do I. If you're modifying the display list whenever the view changes, you're not getting the benefit of display lists.

Also, you appear to be using world-height chunks. Note that Minecraft uses cubical 16×16×16 chunks for its display lists, unlike for load and save. If you do that, there's even less reason to frustum cull individual chunks.

(Note: I have not examined the code of Minecraft. All of this information is either hearsay or my own conclusions from observing Minecraft's rendering as I play.)

More general advice:

Remember that your rendering executes on two processors: CPU and GPU. When your frame rate is insufficient, then one or the other is the limiting resource — your program is either CPU-bound or GPU-bound (assuming it isn't swapping or having scheduling problems).

If your program is running at 100% CPU (and doesn't have an unbounded other task to complete), then your CPU is doing too much work. You should try to simplify its task (e.g. do less culling) in exchange for having the GPU do more. I strongly suspect this is your problem, given your description.

On the other hand, if the GPU is the limit (sadly, there aren't usually convenient 0%-100% load monitors) then you should think about how to send it less data, or require it to fill fewer pixels.

Great reference, your research on it mentioned on your wiki was very helpful to me! +1
–
Gustavo MacielApr 17 '12 at 5:47

@OP: only render visible faces (not blocks). A pathological but monotonic 16x16x16 chunk will have nearly 800 visible faces, while the contained blocks will have 24,000 visible faces. Once you've done that, Kevin's answer contains the next most important improvements.
–
AndrewSJul 29 '14 at 17:54

@KevinReid There are some programs to help with performance debugging. AMD GPU PerfStudio for example tells you if its CPU or GPU bound and on GPU what component is the bound(texture vs fragment vs vertex, etc) And I am sure Nvidia has something similar too.
–
akaltarJul 7 at 10:02

What's calling Vec3f.set so much? If you're building what you want to render from scratch each frame then that's definitely where you'd want to start to speed it up. I'm not much of an OpenGL user and I don't know much about how Minecraft renders, but it seems that the math functions you using are killing you right now (just look how much time you spend in them and the number of times they get called - death by a thousand cuts calling them).

Ideally your world would be segmented up such that you can group stuff to render together, building Vertex Buffer Objects and re-using them across multiple frames. You'd only need to modify a VBO if the world it represents changes somehow (like the user edits it). You can then create/destroy VBO's for what you're representing as it comes into range of being visible to keep memory consumption down, you'd only take the hit as the VBO was created rather than every frame.

If the "invocation" count is correct in your profile, you're calling an awful lot of things an awful lot of times. (10 million calls to Vec3f.set... ouch!)

Minecraft and your code likely uses the fixed function pipeline; my own efforts have been with GLSL but the gist is generally applicable, I feel:

(From memory) I made a frustum that was a half-block bigger than the screen frustum. I then tested the center-points of each chunk (minecraft has 16*16*128 blocks).

The faces in each has spans in an element-array VBO (many faces from chunks share the same VBO until it is 'full'; think like malloc; those with the same texture in the same VBO where possible) and the vertex indices for the north faces, south faces and so on are adjacent rather than mixed. When I draw, I do a glDrawRangeElements for the north faces, with the normal already projected and normalized, in a uniform. Then I do the south faces and so on, so the normals are not in any VBO. For each chunk, I only have to emit the faces that will be visible - only those in the center of the screen need to draw left and right sides, for example; this is simple GL_CULL_FACE at an application level.

The biggest speedup, iirc, was culling interior faces when polygonizing each chunk.

Also important is texture-atlas management and sorting faces by texture and putting the faces with the same texture in the same vbo as those from other chunks. You want to avoid too many texture changes and sorting the faces by texture and so on minimizes the number of spans in the glDrawRangeElements. Merging adjacent same-tile faces into bigger rectangles was also a big deal. I talk about merging in the other answer cited above.

Obviously you only polygonize those chunks that have ever been visible, you may discard those chunks that haven't been visible for a long time, and you re-polygonize chunks that are edited (as this is a rare occurrence compared to rendering them).

Where are all your comparisons (BlockDistanceComparator) coming from? If it's from a sort function, could that be replaced with a radix sort (which is asymptotically faster, and not comparison based)?

Looking at your timings, even if the sorting itself isn't that bad, your relativeToOrigin function is being called twice for each compare function; all of that data should be computed once. It should be faster to sort an auxiliary structure e.g.

It would seem that your code is drowning in objects and function calls. Gauging the numbers, it doesn't seem like there is any in-lining happening.

You could try to find a different Java environment, or simply mess with the settings of the one you have, but a plain and simple way of making your code, not fast, but a good deal less slow is at least internally in Vec3f to stop coding OOO*. Make every method self containing, don't call any of the other methods just to perform some menial task.

Edit: While there is overhead all over the place it would seem that ordering the blocks before rendering is the worst performance eater. Is that really necessary? If so you should probably start out by going through a loop and calculate each blocks distance to origin, and then sort by that.

Yep, you'll save memory, but lose CPU! So OOO is not too good in realtime games.
–
Gustavo MacielJan 20 '12 at 0:00

As soon as you start profiling (and not just sampling), any inlining that the JVM normally does vanishes. It's kind of like quantum theory, can't measure something without chaning the outcome :p
–
MichaelJan 20 '12 at 0:28

@Gtoknu That is not universally true, at some level of OOO the function calls start to take up more memory than inline code would. I'd say there is a good part of the code in question that is around the break even point for memory.
–
eBusinessJan 20 '12 at 0:36

You could also try to break down Math operations down to bitwise operators.
If you have 128 / 16, try to make a bitwise operator: 128 << 4.
This will help a lot with your problems. Don't try to make things run at full speed. Make your game update at a rate of 60 or something, and even break that down for other things, but you would have to do destroying and or placing voxels or you would have to make a todo-list, which would bring down your fps.
You could do an update rate of about 20 for entities.
And something like 10 for world updates and or generation.