I’m still working on the project. The stuff I’m working on now is a little boring and would mostly be a re-hash of stuff I’ve discussed before. So rather than do that, let’s talk about data structures.

Remember the little red cube. We’ll see him again later.

When a programmer sits down to write some software, the question comes, “How will I represent this problem in code?” This is actually one of my favorite parts of the job. (Or hobby, in this case. Unless you’re hiring? Do you want to fund development of this project? It has no gameplay, no plan, no support, and it doesn’t even work yet. Let me know if you’re interested!) If I need to store a historically important date like August 24, 1971, then there’s a lot of ways I could do it. I could store it as a text string: “August 24, 1971”. But then it would be hard to do math on it. (Say, to calculate how long it’s been between then and now.) I could store it as a group of three integers: 8, 24, and 1971. I could store it as the number of seconds elapsed since January 1, 1970: 51,840,000. The latter is great for doing math (and is actually how most systems store time internally) but it means you have to do a messy conversion when you want to display the date to the end user. Because if you display a date as 51,840,000 then the end user will find you and burn your house down.

What method will be fast? What will take up the least memory? What will make for clear, readable code? These are questions that programmers love to ponder before coming up with the wrong answer and making a mess of things.

Which brings us back to the problem of storing large tables of data.

We look at this and see a table. But internally, computers don’t “do” tables. They’re not really into 2-dimensional kind of stuff. Computer memory is a long, long line of values. If you’ve got four gigabytes of RAM, then you’ve got four billion little memory addresses in a single row, and it’s up to the programmer to make sense out of them.

This is how the table looks to the computer. Red red yellow blue red red blue yellow green blue green green yellow yellow green green. If I want something in the third column, second row, then I have to do a little math to figure out I’m really looking for box #7.

But then some cheeky programmer looks at the data and says, “I can’t afford the luxury of squandering sixteen whole boxes like this. What am I, Donald Trump? This isn’t a supercomputer with endless memory! You know what? I’ll bet there’s a better way to store this.” And then the programmer invents the quad tree.

I’ve already explained how these work way back in part 2, so let’s not go over that again. The point is that I can no longer look things up the way I did before. If I want the third column, second row, then I have to look inside a box, inside a box, inside a box. There is no shortcut to getting there. It’s a tradeoff. We’re trading speed, code clarity, and convenience in exchange for not using up so much dang memory. That’s a lot to give up, and we wouldn’t even contemplate this if not for the fact that 3-dimensional data (like our cube world) gets really, really big, really fast. Width times height times depth is a simple calculation with terrifying implications.

But you don’t want to have to store the entire world in memory at once, not even in a tree. It would be impractical. In the case of an open-world game, the data wouldn’t even fit in memory, not even when using a quad/octal tree. Also, if the world was 2 kilometers wide (not very big) then every single lookup would take 11 hops. You’d need to look at the box-within-a-box-within-a-box, 11 levels deep.

So what we want is a hybrid system. We want the convenient lookups of using a grid mixed with the memory savings of using a tree. We want a grid… of trees.

Ideally, your trees should have a maximum size of n, where n is the largest power of 2 that’s likely to be homogeneous. Look at your giant data set. What’s the largest area of same-squares? If you never see an area larger than 16×16 same-color squares, then there’s no reason to make your trees larger than that.

Which brings me to the structure of project Octant:

So when we want a particular cube, we do a little math to figure out what column it would be in. We look up that column (if it’s available) and ask for the related node. From the node we grab the octree, and from there we drill down to the cell in question. So our worst-case scenario is:

That’s a lot of hops. Things get really fun when one cell needs to look up the cell right next to it, and it takes 9 hops to reach its next-door neighbor.

I was rather worried about this. I mean, each empty cell needs to look up all six of each neighbors to see what faces it needs to draw. (Since a cube has six faces.) Six queries time nine hops sounded like a LOT of wasted time. I added a bit of code to allow “backtracking”. I made octrees aware of their parents so that the 2x2x2 octree would be able to reach up and ask the 4x4x4 octree for a particular cube. If it didn’t have it, it would continue to pass the request up the chain. I figured that since the vast majority of lookups were for cubes that were “next door”, I’d see some big savings. Hopping up one level and down one level ought to be faster than going down through all nine levels.

Turns out I was wrong. The time needed to construct a single node went from 180ms to 170ms. That is a very small gain. I expected some massive jump in performance, and instead I got what? A 6% boost?

Still, this is exactly the sort of thing I wanted to play around with when I started the project. It’s sort of interesting to experiment with things and see how they behave.

I’m not totally sold on the structure I outlined above. It’s not terribly complex (by the standards of game engines) and I’m still fiddling with it, looking for where the performance bottlenecks might be. I might discover that this design is flawed in some way. Or maybe I’ll come to the same conclusion Goodfellow did, and end up storing everything in a pure grid. We’ll see what we find.

It’s fine to suggest Kickstarter, but where would you expect Shamus to put it on there? He’s not sure where he wants to go with it, so putting it under Video Games might be misleading and end up with Shamus being hassled by people who just saw cubes and expected a game in a few months. Do we put it under Educational or Documentary as he’s showing us this process as it happens or is that something of an expected side-effect when dealing with a long-running Kickstarter project? Finally, Kickstarter only pays out if you reach your funding goal. What’s a reasonable asking price if you’re not sure what you will actually deliver? What kind of prizes do you offer for the different support levels?

Kickstarter is interesting, but it opens a whole can of worms that Shamus is currently not wriggling in.

(Also, this looks really negative and I didn’t mean to come off this harshly.)

I would actually fund the s##t out of a Shamus-powered documentary. Part of the reason I funded Double Fine Adventure was that I really was interested in the documentary. So far I’ve been pleased with it, but I’d be happier with a camcorder sitting in a corner while a programmer talks through a problem.

So Shamus, please consider recording yourself explaining a problem to a friend, and then explaining how you dealt with said problem. A documentary about your hobby projects would be far more interesting than you might expect. I will hand you all the dollars.

(And by all the dollars I mean maybe $100, but I’m probably not alone on that, so…)

Also, haven’t you noticed the kind of people who use kickstarter? If something so much as smells not right, you get an internet-wide outrage… Do you think Shamus needs that sort of bad PR for not having “a concrete distributable goal for the project”?

You do realize you are talking to a man who barely speaks to his family when in the middle of coding? You need a very specific type of personality for that sort of thing, and someone who is good at recording, and someone who is good at editing and, and, and. Around here “Shamus is coding” = Shamus wandering around the house with his tea cup staring and everyone moving out of the way and being very quiet lest we disturb a thought in process or beginning to talk and realizing he “has that look” and apologizing immediately and backing out of the room. Shamus sitting at the desk staring at the screen and occasionally shouting at it would not translate into a very interesting documentary.

Wow, I would never have guessed that. From this side of the screen, we just see how good he is at explaining programming stuff in text form, and how good he does videos about games, and just assume that programming videos would work too.

Other possibilities include “Quite a bit of code”, “some code”, “a fair amount of code”, “a ton of code”, “way too much code”, and “makes you think why-did-I-do-this-again? amount of code”.

That doesn’t sound that simple to do programming wise, I guess is my point. Well, I suppose if you have a nested loop of some kind that lets you go back one step and search for the new parameters instead of starting over… *shrug*
(Note: I have very basic programming knowledge, so I have no idea if you could do that or how one would go about it…)

Unless I’m misreading something it really is just a little bit of code. He says he made each node aware of its parents. That probably amounts to each node having one extra pointer (probably called “parent”). Then he probably changed the logic for how each node looks up its neighbors. It seems like a minor change.

Of course, that extra pointer means that each node takes up a little more space in memory (4 or 8 bytes probably, depending on the system). Since the whole point of the octree is to save on memory, that’s not an insignificant detail. That’s why it’s disappointing that the optimization only provides a 6% boost. Given that it’s costing him a little extra memory for each node it might not be worth it.

Disclaimer: I’m not a C++ programmer and it’s been awhile since I coded in C (the language I’m most familiar with is Java). If I’m missing something, or I’m making some bad assumptions, hopefully someone with more programming knowledge will come along and correct me.

Could you use a different set of coordinates for the cells then?For example,that “scene Â» column Â» node Â» octree16 Â» octree8 Â» octree4 Â» octree2 Â» octree1 Â» cell” would be (a,b,c,d,e,f,g,h),where a would indicate the column,b would be node,etc.Granted,it would require more coordinates for each cell,but since you use only have 8 blocks per increment,you could use char instead of int.And,it would be simpler to check up on everything around a single cell.

Alternatively he could pass the parent on the stack.
If an operation being run on a heap of cubes was known to generally use its neighbours he could easily have a vector (that is, an array class) containing pointers to the world, column and each octree passed in as a function argument meaning no extra data needs to be stored while still giving the same (albeit small) benefit.

My OOP Fu is weak, but what if each node had pointers to its 6 ortho neighboring nodes? You would walk the data out to a distance (“go 6 nodes north”) then render the nodes on the way back to the origin. A node would know who its neighbors are, accessing them quicker than dealing with colums.

Nodes are related vertically because of “sunlight”. It’s not actually being used for much yet, but my thinking is that I might want sunlight to reach down into the caves. For that to happen, I need to know all of the nodes in a column. The world might be many kilometers wide, but it’s not kilometers deep, so keeping an entire column around is possible.

The relationships between the columns are mediated by the scene module. If it was all pointers, then removing one node would mean I’d have to run around and tell all other nodes that their neighbor was gone and could they please forget him. Otherwise, the next time someone looked next door I’d crash. If a column requests some area that isn’t available, the scene module replies with “empty air”. In this way, columns on the edge of your view can actually build themselves.

That does bring up one of the regrettable limitations of Minecraft and, it sounds like, Project Octant: you can’t do much in the way of directional sunlight. Directional sunlight requires calculations across multiple chunks/columns. Regrettably, I’m not sure there is a good solution, especially since when the sun is low on the horizon and the shadows long, very distant objects can be casting shadows onto the local terrain.

Well, you could have a second system that renders out terrain beyond your normal draw distance in super low resolution. Like, just the first couple of octaves of your noise. That would get the rough shape of e.g. a mountain right. That’s how Oblivion and Skyrim do it (only without the procedural, obviously), they have a really low res version of the entire world (several miles IIRC), and the full detail cells only go out to a hundred meters or so. This would obviously add extra cost, but if you compare the effective draw distance in Morrowind to Oblivion, I’d say it’s worth it. Trying to render that kind of distance in the same level of detail you use up close is quite impossible.

For your code, do you write small unit tests to trial things out. I’m a bit spoiled being a java programmer and having access to junit so easily, which i miss heaps when i do a bit of php programming.

I’ve also found that writing tests before writing code reduces the amount of programming creep that ussually happens or loose ends. Think of what you want at the end of a little module with said starting point. make the test fail, write the code in the middle, run the test, did it past, great, oh and a by product is to see how long it took to run said piece of code. :)

On modern CPUs performance is normally dominated by cache effects. The best ways to make structure traversal faster is to reduce the total number of bytes accessed, and to access them in address order (either increasing or decreasing) – this is why vector can be faster than any other structure up to thousands of elements, even for operations like insert or find that it’s theoretically slower at: there’s only a tiny, constant, size overhead, and items are in memory order so the cache prefetcher can do it’s thing.
If your oct-tree is a naive pointer-chasing tree (“struct OctTree { OctTree* children[8]; };”) then you have huge memory overhead, and your nodes are likely to be completely out of order in memory. This implies you should actually see the biggest speedups by doing the exact opposite of what you’ve done here, and have an octtree of arrays of cells. If you go down this path, you may want to be able to tune how much you pack at runtime, because the optimal size depends on the users’ CPU.

I want do add to this: Use std::vector instead of arrays. It’s at least as fast, and offers safety and usability on top of that with no drawback whatsoever. And it beats std::list in pretty much every case. There’s a talk on Going Native 2011 where Alexandrescu has a few comparisons, and vectors beat lists even where theory suggests lists are many orders of magnitude faster, because caching behaviour absolutely dominates the measurements. It is way faster to run linearly through a few thousand bytes of memory than it is to do just a handful of random RAM-accesses which don’t fit the same cache-block. If you can get your look-ups into the L3 or L2 cache, it really doesn’t matter whether you do 10 or just 1. On the other hand, if you have a single cache-miss, you will end up with less performance than with 10 cache-hits.

I suspect this is also the reason for your lousy gain: Your oct-tree probably does not behave well for caching. I cringe every time I see someone do such “optimisation”, because I could have told you that it won’t work.

Damit, I can’t remember how it worked, but I’ve seen a cool adressing/Memory organizing technique once where you put the whole octree in one coherent space and can derive the pointer to a certain cell by its coordinates in space, with just boolean expressions. That saves you lots and lots of lookups and tree hopping, especially when trying to find neighours for lowest-level-cells on the borders of hightest-level nodes.
This isn’t really my area of expertise, so I didn’t pay enough attention to remember it properly. But it can be done.

Oh, this brings me to a theory: If you don’t start at the root of the octree for every lookup, that means in some cases you’ll have to move from one cell to the root and then down again in order to find a neighbour. Actually … even with regular adressing/memory management, a modulus operation should be able to tell you, which tree level you should start at. (cell interface normal to x has an x coordinate divisible by 16? => go four levels up, then down again)

It’s been 3 years since this post originally came up, and I’m still wondering why he made a grid of trees instead of a tree of grids.(I’m watching a SpaceX presentation on how they optimize their rocket simulations. Hybrid trees of the “tree of grids” variety were mentioned.) I can’t help but wonder if the later entries in this series might have gone differently if he had.(specifically the part where it comes time to delete the hybrid grid system. I wonder if that would have gone differently.)

I think the problem there would be scalability. If you have your top level being a tree, that’s either a hard limit on world size, you have to keep adding levels (with all the attendant overhead), or you end up having to have a grid of trees anyway (which then hold grids).

…Thinking about it further, would you even get that far? The whole point of the tree is collapsing homogeneous regions; in what way can two grids of cells be homogeneous like that?

No, not gonna fund the “I want to noodle around with smack and have fun” project. But I would FUND THE HELL OUT OF “Shamus Teaches You How to Write Kick Ass Code for the Beginner who wants to Kick Ass like Shamus” tutorial.

Dude, seriously, I ain’t playing. If you gave lessons like that…with drawings, real world experience, taught from the ground up like you plan on doing real stuff…yeah…I’d pay for that. By the lesson, by the month, one time use for life…what ever made sense.

After I did my Octree implementation (32 by 32 by 32 chunks), I discovered that I wasn’t really saving much memory, and it was costing more time.

Currently, I just keep all the cells in straight arrays — 64K per chunk. There are usually a few hundred chunks active at a time for about 32 meg of memory — very manageable on a modern machine.

If I wanted to reduce that memory use, I would keep the chunk with the player in it as an array, and do some simple compression on the chunks farther away. It looked like I could do a run-length compress or decompress in a few milliseconds, which would be very tolerable as you move around.

One other tip – at the edge of your chunks, you have a problem that in order to decide which cell faces to draw, you have to go into the next chunk, which is a nuisance. You could just draw all the outside cell faces. You would be double-drawing a face if the next chunk has an adjacent cell, but for opaque data, it doesn’t matter.

Transparent data is another story. If you draw a cell of water and also draw the adjacent cell, you will see the boundary between the two, which you don’t want. To avoid this case, you have to check all the adjacent cells, even when they are in different chunks.

To make that fast, I actually keep 34 by 34 by 34 cells, and repeat the data from the adjacent chunks. The only price of this (other than a bit more memory) is that when the user *changes* a cell, you may have to set it in 3 chunks. Since that happens at user speed, you’ll never notice the extra work.

Have you considered storing the neighbours as pointers for each node. It would mean 24 more bytes per node and an extra step after world building, but relative accesses would be a single pointer away.

I remember implementing this method (on a quadtree) by successively applying a function that checked the children of the neighbors of the parent the current node and assigning them (if they existed) as the neighbours, otherwise leaving their parents as its neighbours. The function was applied level by level using a helper function that recursed until it reached the right level.

Changes in geometry would have some overhead, but a minecraft style game only has about a block every few seconds changed. At worst the post process would need to be applied on four tiles.

Since I'm learning about octrees from this series, I may be totally off the rails here, but: in the colored-in quad tree “˜grid', shouldn't the two yellow boxes in the lower left corner be a single box? Or is the dividing line meant to be there?

Well you can try to store the dimensions of any arbitrary-shaped contiguous block of a single color, but it gets tricky, and beyond a certain point you spend nearly as much space describing your region as you would have describing its component squares (or cubes.) In the basic version you store only squares (or cubes) that are homogenous. If you have to break it up at all, you break it up all the way down to the next level of detail. Even going to rectangular regions seems like it would introduce a lot more complexity for not much savings (he says, without of course having tried it to find out…)

You could do that if you made just a Binary tree and had every other level be vertical or horizontal. Then you’d have alternating layers of 1×2 rectangles and 1×1 squares that make up those rectangles. The complexity increase would be pretty much negligible.

For a Quadtree, though, you’re immediately doubling your search time and adding a small amount of overhead to each independant layer(probably a net 5% increase in memory consumption per 4 leaf objects) with the benefit being an occasional 1-leaf reduction in memory usage, so it’s not really worth it.

Shamus, maybe try some run length encoding in addition? Also, about the GUI thing, why not try to roughly do your own to compare with Qt. If you do your own you should have no issue decoupling GUI and scenerey and input, and you could in addition to that do threaded rendering thus having vsync but without the mouse input lag that so many games suffer from when using vsync.

By that I mean:
Check player state/movement.
Calculate world changes, and check if any changes happened in the camera view.
Check if visual changes fall within the vsync “timeslice” or not, if it does then render, if not then save the CPU by not rendering. After all there is no point rending more than 60fps if vsync/the monitor is at 60Hz…

The difficulty though is multiple threads and juggling at least 3 “framerates” the input “framerate”, the calculation “framerate” and the rendering/vsync/monitor framerate.

vsync/rending one is obvious, input is keyboard and mouse the data is very light and thus it can be checked rather fast without using that much cpu (i.e. get a responsive mouse).

Once all 3 parts (input/calculation/rending&vsync) are decoupled/asynchronous the game should still feel smooth even if the rending starts skipping frames the input and calculation is still spot on, so the player will get some visual jerkiness but not any mouse sluggishness for example.

The time saving seems about right given that for all cubes on the surface of the largest octree, you are almost doubling the functions queried for at least one of the neighbouring cubes. It might be interesting to query upwards until (say) octree 1 and then dump straight up to the column level to introduce one extra function (rather than -7 to +16 ) for half the cubes, but save 8 for the other 3 cubes per cell.

Oct trees are great at telling you what is in a given region quickly when you want to do collision or hit-testing. But only when the elements you are using don’t have relation between their location and the space they are occupying. e.g. you have many irregular shapes of different sizes. Oct trees are great only by comparison because alternative is check by iterating over all elements and do a collision test which royally sucks.

However for blocks that are of uniform shape and size there is no need for this. The position in the world translates directly to the space they are occupying. If you want to know if something is at some location you can do Math.floor(position / blocksize) and look that up in the array, which is infinity faster (but only works when you can put stuff in an array).

Using them to do collision detection in Project Frontier might show a better result. The trees there don’t have uniform placement or sizes so you need a data structure that allows you to tell which trees occupy the same general region as the player (or something else).