Revisiting the block worlds stuff

In revising the ANL, I've revisited some of my old projects. Today I did up another example Lua script, included in the Examples folder, called blockworld.lua. It's a remake of my old cube-world generation stuff, demonstrating how a complex function to assign block types for a cube terrain can be built up using ANL's modular pieces. The example follows pretty closely along with my original articles, although it is updated for the new library version.

Since ANL is just a Lua framework, with no in-built graphical capabilities, visualizing such a beast as a cube world isn't straightforward. The example script constructs the noise function, maps it to a voxel array, then iteratively constructs meshes for each of the block types using the Volume functionality, built atop PolyVox. It then dumps the meshes to .OBJ format files. The script generates chunks of the function, specified by chunkx and chunkz coordinates, so the world is essentially infinite (at least to the limits of double precision).

The function isn't sophisticated. It shows strata layering using a distorted vertical gradient, ore deposits, caves with the lower levels filled with magma, a solid bedrock layer, etc... There is no large-scale variation in the distortion function, such as you might want to use if you are generating different kinds of biomes and geography; it just uses a simple fBm fractal with some slight vertical axis dampening to reduce the frothy distortion that can result in floating islands of terrain. Here are a couple shots of it in action:

I don't do much with this sort of generation. I don't really have any interest in building a Minecraft clone, for example. My interest is purely academic, a fascination with procedural world-building in general. Every time I tinker with this kind of thing, I am reminded how my library is actually kind of slow. This example script does a lot of stupid shit with copying and recopying buffers, iterating voxel arrays multiple times, not cacheing some values, etc... It could be faster, but even if I squeezed it as tight as I could it would still be somewhat slow.

I've always been a concepts guy. I have ideas for things I want to tinker with, I build a tool or a prototype to demonstrate it, I move on. I've spent a good bit of time on ANL, mungeing things around, re-organizing, etc... but I've never been very good at optimization, and so ANL has never really been tightened up. But lately, I've been thinking about how I can make things faster.

To start with, it seems like learning about SSE might be a good place to go. This sort of function wrangling is pretty computation-heavy (especially when dealing with the higher orders of noise; ANL does up to 6D noise, which can be slow as balls) so I'm thinking that between implementing many of the computations using SSE intrinsics and trying to take advantage of multiple cores, I might be able to squeeze some better juice out of this thing. I don't know how much time I'll be able to devote to it for now, but I'll probably at least start reading a few sources to learn the ins and outs of SSE to see if it's something I could pull off.

Thanks for the additional example script. blockworld.lua takes about 20 seconds on my Phenom (3.3 GHz). Doesn't sound all too bad for me. But I hear you. Noise screams for parallelization in one way or another.

As food for thought I recommend reading this dev blog. They're doing amazing stuff, but it's a whole team and one guy even juggles with tensors.

PS: Oh, and thanks for making me rediscover de Carpentier. (For other readers: ANL has built-in support for this kind of noise).

That was an interesting link, thanks. It looks like they're considering basically a similar system to what I'm actually considering; that is, representing a module tree as an expression and parsing it down to a sort of sequential bytecode operation. That is the idea I've been considering. I would build an expression kernel from either a Lua table, an explicitly built module tree (similar to how I build them now), or some other means, and turn it into a flat array of operations that can be executed by a task handler to generate a value. This way, I would eliminate all of the virtual calls that occur now, and solve problems of requiring a thread-unfriendly cache in each module to prevent multiple duplicate calculations. Each stage of the tree would be an operation that could refer backwards to operations earlier in the sequence, then cache their results locally for subsequent operations. By giving a copy of the expression sequence to multiple tasks (each with its own result array for cacheing) I can multi-thread it fairly easily.

Looking at most of my use-cases (in particular, the island generators for GC plus a slew of procedural textures), I'm not sure my idea of compiling down to a sequential array of bytecode would really be the best thing, A lot of my stuff uses a step function, or a variant of a step function that includes a fade boundary. If the tree is evaluated from the root up, this provides an early-out for entire branches of the tree, whereas if I compile it to a sequential array and evaluate from the start, it will evaluate all leaves first, then the next branch up, and so forth until root. The early-out provided by the non-selected branch of a step function in this case won't happen, so I'd be getting lots of redundant processing.

Thanks, unbird. I made a couple changes, adding an option in CMake to enable/disable SSE2 (it makes a difference on GCC build as well) as well as an option to choose double or float precision. The double/float conversion is still kind of questionable in my mind given that it was a search/replace with a macro definition for the float type and while I fixed some compile bugs it created, I haven't fully checked for unanticipated side-effects yet. Still, I see a difference of about .5 seconds on multicolor.lua between the GCC double no-SSE build and the GCC float SSE build, so I reckon on more complicated things the savings might be better still. I haven't propagated most of the new changes into the Goblinson Crusoe codebase, so I can't yet test it there. I'll wait until it all stabilizes a bit further before I allow it in to my game.

Is it just me or does the second island bear a strong resemblance to a chocolate icecream cone?

I think SSE would be a huge improvement, as well as optimizing memory transfers (copying stuff back and forth sounds fast but it gets really slow really quickly). Going GPU would be arguably faster but that is probably not realistic for a general-purpose library. Make sure to use the intrinsics, they are easier to use and still allow the compiler to reason about your code, which it can't do if you're writing raw assembly, not as effectively anyway.

That's Sierra Chocolate. Cream Peak is at another chunk, though.must be really hot where you are.

For my MSVC the switch to single precision was surprisingly a huge step back in performance. Take this with a grain of salt, I compared against an earlier version and probably missed something else. Maybe I got some time this weekend to play with a profiler and some compiler flags.

Do you have any links or could maybe give a little description on what "twistiness" is? I really want to move onto 3D noise for terrain but I'm having trouble getting it to be less....blobby and bubbly, for lack of better words...

@studentTeacher: Here are some images I made for an article explaining all of this stuff. They are 2D, but 3D is just a natural extension of it. (And it's hard to visualize this in 3D, without a bunch of volumetric imaging tools.)

1) The divide between solid and open is a step function. Any coordinates with a Y component above the ground threshold are open, any coordinates below it are solid. Without any kind of perturbation, this creates a flat-topped volume, with the ground surface comprising the top of the volume. In 2D, it creates a region of solid below, open above like this:

2) If a 2D noise function is used to perturb the 3D ground plane up or down (heightmap style) then hills are formed. In essence, any input point that has a given X and Z coordinate, regardless of Y, is "pushed" up or down by some value. All points at that X,Z are pushed by the same amount. This deforms the plane up and down, but does not allow for overhangs or sheer cliffs. In 2D, this is similar to deforming the ground line by a 1D noise function, like this:

3) In order to generate a ground surface with overhangs and cliffs, you need to perturb the 3D surface by a 3D noise function, rather than a 2D function. This means that each input coordinate is pushed up or down by its own amount, and this amount is not necessarily the same as other points that share X and Z with the point. In 2D, this is similar to perturbing the 2D surface with a 2D noise function. The result is a "frothy" terrain that can have cliffs, overhangs, and even disconnected floating islands:

4) Of course, while overhangs are nice, floating islands really aren't. In the blockworlds example, I take the distortion fractal (called elev/elevscale) and scale its input coordinate by another fractal, twistiness. This fractal is corrected to output values in the range 0.25 to 0.5. When applytwist is called, the effect is that this fractal is called and the value it returns is multiplied by the Y coordinate of the input point before calling the elevation fractal. As this value approaches 0 it has the effect of gradually "morphing" the elevation fractal toward a 2D version. That is, if you scale the input Y coordinate of a point by 0 before calling the function, it is tantamount to calling noise(X,0,Z), which results in effectively a 2D result. As demonstrated above, a 2D fractal won't result in overhangs, cliffs or floating islands. So the closer this scaling value gets to 0, the lower the likelihood of generating these kinds of features. So by scaling down the "twistiness" you tame the fractal and eliminate some of the weirdness. The use of a fractal for scaling here provides some terrain variation, so that some areas are "twistier", or more frothy, than others.

Eliminating the blobbiness and bubbliness is also a factor of lowering the frequency of the elevation function. By lowering the frequency, the features are stretched out across more area.

As promised I've played a bit with the flags. I'm still pretty clueless, though. No matter what flags I use, double is always faster. Guess GCC is better at auto-vectorizing - and general optimization at that. VerySleepy tells me the hotspot is - not surprisingly - at anl::gradient_noise/cellular either way (map.lua). Still can't tell you the reason for the difference, sorry (cache issues, alignment ?).

Anyway: I don't mind. I'm happy with a scriptable noise library to play with.