Frustum vs sphere tests are significantly faster than frustum vs OOBB. By rejecting objects that fail sphere culling first, we have fewer objects to process in the more expensive OOBB pass.

Why go over all objects brute force instead of using some sort of spatial partition data structure? We like to keep things simple and with the current setup we have yet to encounter a case where we've been bound by the culling. Brute force sphere culling followed by OOBB culling is fast enough for all cases we've encountered so far. That might of course change in the future, but we'll take care of that when it's an actual problem.

The brute force culling is pretty fast, because:

The sphere and the OOBB culling use SIMD and only load the minimum amount of needed data.

The workload is distributed over several threads.

In this post, I we will first look at the single threaded SIMD code and then how the culling is distributed over multiple threads.

I'll use a lot of code to show how it's all done. It's mostly actual code from the engine, but it has been cleaned up to a certain extent. Some stuff has been renamed and/or removed to make it easier to understand what's going on.

Data structures used

If you go back to my previous post about state reflection, http://bitsquid.blogspot.ca/2016/09/state-reflection.html you can read that each object on the main thread is associated with a render thread representation via a render_handle. The render_handle is used to get the object_index which is the index of an object in the _objects array.

For culling MeshObjects and other cullable types are represented by culling::Objects that are used to populate the culling data structures. As can be seen in the code they are _cullable_objects, _cullable_shadow_casters and _occluders and they are all represented by an ObjectSet:

When an object is added to, e.g. _cullable_objects the culling::Object data is added to the ObjectSet. The ObjectSet flattens the data into a structure-of-arrays representation. The arrays are padded to the SIMD lane count to make sure there's valid data to read.

Frustum-sphere culling

The world space positions and sphere radii of objects are represented by the following members of the ObjectSet:

The frustum-sphere intersection code tests one plane against several spheres using SIMD instructions. The ObjectSet data is already laid out in a SIMD friendly way. To test one plane against several spheres, the plane's data is splatted out in the following way:

After the simd_sphere_culling call, the visibility_flag array contains 0 for all objects that failed the test and 0xffffffff for all objects that passed. We chain this together with the OOBB culling by doing a compactness pass over the visibility_flag array and populating an indirection array:

More specifically we use the Method 2: Transform box vertices to clip space, test against clip-space planes that both Fabian and Arseny write about. But we also go with Method 2b: Saving arithmetic ops that Fabian mentions. I won't dwelve into how the culling actually works, to understand that please read their posts.

The code is SIMDified to process several OOBBs at the same time. The same corner of four multiple OOBBs is tested against one frustum plane as a single SIMD operation.

To be able to write the SIMD code in a more intuitive form a few data structures and functions are used:

The final call to remove_not_visible populates the indirection array with the objects that passed both the frustum-sphere and the frustum-OOBB culling. indirection together with n_oobb_visible is all that is needed to know what objects should be rendered.

Distributing the work over several threads

In Stingray, work is distributed by submitting jobs to a pool of worker threads -- conveniently called the ThreadPool. Submitted jobs are put in a thread safe work queue from which the worker threads pop jobs to work on. A task is defined as:

For the purpose of this article, the interesting methods of the ThreadPool are:

class ThreadPool
{
// Adds `count` tasks to the work queue.
void add_tasks(const TaskDefinition *tasks, uint32_t count);
// Tries to pop one task from the queue and do that work. Returns true if any work was done.
bool do_work();
// Will call `do_work` while `signal` == value.
void wait_atomic(std::atomic<uint32_t> *signal, uint32_t value);
};

The ThreadPool doesn't dictate how to synchronize when a job is fully processed, but usually a std::atomic<uint32_t> signal is used for that purpose. The value is 0 while the job is being processed and set to 1 when it's done. wait_atomic() is a convenience method that can be used to wait for such values:

The same pattern is used to multi-thread the frustum-OOBB culling. That is "left as an exercise for the reader" ;)

Conclusion

This type of culling is done for all of the objects that can be rendered, i.e. meshes, particle systems, terrain, etc. We also use it to cull light sources. It is used both when rendering the main scene and for rendering shadows.

I've left out a few details of our solution. One thing we also do is something called contribution culling. In the frustum-OOBB culling step, the extents of the OOBB corners are projected to the near plane and from that the screen space extents are derived. If the object is smaller than a certain threshold in any axis the object is considered as culled. Special care needs to be considered if any of the corners intersect or is behind the near plane so we don't have to deal with "external line segments" caused by the projection. If you don't know what that is see: http://www.gamasutra.com/view/news/168577/Indepth_Software_rasterizer_and_triangle_clipping.php. In our case the contribution culling is disabled by expanding the extents to span the entire screen when any corner intersects or is behind the near plane.

For our cascaded shadow maps, the extents are also used to detect if an object is fully enclosed by a cascade. If that is the case, then that object is culled from the later cascades. Let me illustrate with some ASCII:

The squares are the different cascades. The top left square is the first cascades, the top right is the second cascade, bottom left the third and the bottom right is the fourth cascade. In this case the weird triangle shaped object is fully enclosed by the first cascade. What that means is that the object doesn't need to be rendered to any of the later cascades, since the shadow contribution from that object will be fully taken care of from the first cascade.

Wednesday, September 7, 2016

Overview

The Stingray engine has two controller threads -- the main thread and the render thread. These two threads build up work for our job system, which is distributed on the remaining threads. The main thread and the render thread are pipelined, so that while the main thread runs the simulation/update for frame N, the render thread is processing the rendering work for the previous frame (N-1). This post will dive into the details how state is propagated from the main thread to the render thread.

I will use code snippets to explain how the state reflection works. It's mostly actual code from the engine but it has been cleaned up to a certain extent. Some stuff has been renamed and/or removed to make it easier to understand what's going on.

The main loop

Here is a slimmed down version of the update loop which is part of the main thread:

while (!quit())
{
// Calls out to the mandatory user supplied `update` Lua function, Lua is used
// as a scripting language to manipulate objects. From Lua worlds, objects etc
// can be created, manipulated, destroyed, etc. All these changes are recorded
// on a `StateStream` that is a part of each world.
_game->update();
// Flush state changes recorded on the `StateStream` for each world to
// the rendering world representation.
unsigned n_worlds = _worlds.size();
for (uint32_t i = 0; i < n_worlds; ++i) {
auto &world = *_worlds[i];
_render_interface->update_world(world);
}
// Begin a new render frame.
_render_interface->begin_frame();
// Calls out to the user supplied `render` Lua function. It's up to the script
// to call render on worlds(). The script controls what camera and viewport
// are used when rendering the world.
_game->render();
// Present the frame.
_render_interface->present_frame();
// End frame.
_render_interface->end_frame(_delta_time);
// Never let the main thread run more than 1 frame a head of the render thread.
_render_interface->wait_for_fence(_frame_fence);
// Create a new fence for the next frame.
_frame_fence = _render_interface->create_fence();
}

First thing to point out is the _render_interface. This is not a class full of virtual functions that some other class can inherit from and override as the name might suggest. The word "interface" is used in the sense that it's used to communicate from one thread to another. So in this context the _render_interface is used to post messages from the main thread to the render thread.

As said in the first comment in the code snippet above, Lua is used as our scripting language and from Lua things such as worlds, objects, etc can be created, destroyed, manipulated, etc.

The state between the main thread and the render thread is very rarely shared, instead each thread has its own representation and when state is changed on the main thread that state is reflected over to the render thread. E.g., the MeshObject, which is the representation of a mesh with vertex buffers, materials, textures, shaders, skinning, data etc to be rendered, is the main thread representation and RenderMeshObject is the corresponding render thread representation. All objects that have a representation on both the main and render thread are setup to work the same way:

The render_handle is an ID that identifies the corresponding object on the render thread. state_reflection is a stream of data that is used to propagate state changes from the main thread to the render thread. type is an enum used to identify the type of render objects.

Object creation

In Stingray a world is a container of renderable objects, physical objects, sounds, etc. On the main thread, it is represented by the World class, and on the render thread by a RenderWorld.

When a MeshObject is created in a world on the main thread, there's an explicit call to WorldRenderInterface::create() to create the corresponding render thread representation:

There is a recycling mechanism for the render handles and a similar pattern reoccurs at several places in the engine. The release_render_handle function together with the new_render_handle function should give the complete picture of how it works.

There is one WorldRenderInterface per world which contains the _state_reflection that is used by the world and all of its objects to communicate with the render thread. The StateReflection in its simplest form is defined as:

What happens here is that alloc_message will allocate enough bytes to make room for a MessageHeader together with the size of ObjectManagementPackage in a buffer owned by the StateStream. The StateStream is defined as:

This is the necessary code on the main thread to create an object and populate the StateStream which will later on be consumed by the render thread. A very similar pattern is used when changing the state of an object on the main thread, e.g:

Getting the recorded state to the render thread

Let's take a step back and explain what happens in the main update loop during the following code excerpt:

// Flush state changes recorded on the `StateStream` for each world to
// the rendering world representation.
unsigned n_worlds = _worlds.size();
for (uint32_t i = 0; i < n_worlds; ++i) {
auto &world = *_worlds[i];
_render_interface->update_world(world);
}

When Lua has been creating, destroying, manipulating, etc objects during update() and is done, each world's StateStream which contains all the recorded changes is ready to be sent over to the render thread for consumption. The call to RenderInterface::update_world() will do just that, it roughly looks like:

void RenderInterface::update_world(World &world)
{
UpdateWorldMsg uw;
// Get the render thread representation of the `world`.
uw.render_world = render_world_representation(world);
// The world's current `state_stream` that contains all changes made
// on the main thread.
uw.state_stream = world->_world_reflection_interface.state_stream;
// Create and assign a new `state_stream` to the world's `_world_reflection_interface`
// that will be used for the next frame.
world->_world_reflection_interface->state_stream = new_state_stream();
// Post a message to the render thread to update the world.
post_message(UPDATE_WORLD, &uw);
}

This function will create a new message and post it to the render thread. The world being flushed and its StateStream are stored in the message and a new StateStream is created that will be used for the next frame. This new StateStream is set on the WorldRenderInterface of the World, and since all objects being created got a pointer to the same WorldRenderInterface they will use the newly created StateStream when storing state changes for the next frame.

Render thread

The render thread is spinning in a message loop:

void RenderInterface::render_thread_entry()
{
while (!_quit) {
// If there's no message -- put the thread to sleep until there's
// a new message to consume.
RenderMessage *message = get_message();
void *data = data(message);
switch (message->type) {
case UPDATE_WORLD:
internal_update_world((UpdateWorldMsg*)(data));
break;
// ... And a lot more case statements to handle different messages. There
// are other threads than the main thread that also communicate with the
// render thread. E.g., the resource loading happens on its own thread
// and will post messages to the render thread.
}
}
}

It calls update() on the RenderWorld with the StateStream and when that is done the StateStream is released to a pool.

void RenderWorld::update(StateStream *state_stream)
{
MessageHeader *message_header;
StatePackageHeader *package_header;
// Consume a message and get the `message_header` and `package_header`.
while (get_message(state_stream, &message_header, (void**)&package_header)) {
switch (package_header->object_type) {
case RenderWorld::TYPE:
{
auto omp = (WorldRenderInterface::ObjectManagementPackage*)package_header;
// The call to `WorldRenderInterface::create` created this message.
if (message_header->type == WorldRenderInterface::CREATE)
create_object(omp);
}
case (RenderMeshObject::TYPE)
{
if (message_header->type == MeshObject::SET_VISIBILITY) {
auto svp = (MeshObject::SetVisibilityPackage*>)package_header;
// The `render_handle` is used to do a lookup in `_objects_lut` to
// to get the `object_index`.
uint32_t object_index = _object_lut[package_header->render_handle];
// Get the `render_object`.
void *render_object = _objects[object_index];
// Cast it since the type is already given from the `object_type`
// in the `package_header`.
auto rmo = (RenderMeshObject*)render_object;
// Call update on the `RenderMeshObject`.
rmo->update(message_header->type, package_header);
}
}
// ... And a lot more case statements to handle different kind of messages.
}
}
}

The above is mostly infrastructure to extract messages from the StateStream. It can be a bit involved since a lot of stuff is written out explicitly but the basic idea is hopefully simple and easy to understand.

On to the create_object call done when (message_header->type == WorldRenderInterface::CREATE) is satisfied:

So the take away from the code above lies in the general usage of the render_handle and the object_index. The render_handle of objects are used to do a look up in _object_lut to get the object_index and type. Let's look at an example, the same RenderWorld::update code presented earlier but this time the focus is when the message is MeshObject::SET_VISIBILITY:

void RenderWorld::update(StateStream *state_stream)
{
StateStream::MessageHeader *message_header;
StatePackageHeader *package_header;
while (get_message(state_stream, &message_header, (void**)&package_header)) {
switch (package_header->object_type) {
case (RenderMeshObject::TYPE)
{
if (message_header->type == MeshObject::SET_VISIBILITY) {
auto svp = (MeshObject::SetVisibilityPackage*>)package_header;
// The `render_handle` is used to do a lookup in `_objects_lut` to
// to get the `object_index`.
uint32_t object_index = _object_lut[package_header->render_handle];
// Get the `render_object` from the `object_index`.
void *render_object = _objects[object_index];
// Cast it since the type is already given from the `object_type`
// in the `package_header`.
auto rmo = (RenderMeshObject*)render_object;
// Call update on the `RenderMeshObject`.
rmo->update(message_header->type, svp);
}
}
}
}
}

The state reflection pattern shown in this post is a fundamental part of the engine. Similar patterns appear in other places as well and having a good understanding of this pattern makes it much easier to understand the internals of the engine.

Tuesday, September 6, 2016

The current Stingray localization system is based around the concept of
properties. A property is any period separated part of the file name
before the extension. Consider the following three files:

trees/larch_03.unit

trees/larch_03.fr.unit

trees/larch_03.ps4.unit

These three files all have the same type (.unit), and the same name
(trees/larch_03), but their properties differ. The first one has
no properties set. The second one has the property .fr and the last
one has the property .ps4. (Note that resources can have more than
one property.)

Properties are resolved in slightly different ways, depending on
the kind of property. Platform properties are resolved at compile
time, so if you compile for PS4, you will get the PS4 version of
the resource (or the default version if there is no .ps4 specific
version).

Other properties are resolved at resource load time. When you load
a bunch of resources, which property variant is loaded
depends on a global property preference order set
from the script. A property preference order of ['.fr', '.es'] means
that resources with the property .fr are be preferred, then resources
with the property .es (if no .fr resource is available), and finally
a resource without any properties at all.

This single mechanism is used for localizing strings, sounds, textures,
etc. Strings, for example, are stored in .strings files, which are
essentially just key-value stores:

file = "File"
open = "Open"
...

To create a French localized of this menu.strings resource, you
just create a menu.fr.strings resource and fill it with:

file = "Fichier"
open = "Ouvert"
...

This basic localization system has served us well for many years, but
it has some drawbacks that are starting to become more pronounced:

It doesn't allow file names with periods in them. Since we always
interpret periods as properties, periods can't be a part of the
regular file name. This isn't a huge problem when users name their own
files, but as we are increasing the interoperability between Stingray
and other software packages we more and more run into software that has,
let's say peculiar, ways of naming its files. Renaming things by hand
is cumbersome and can also break things when files cross-reference
each other.

Switching language requires reloading the resource packages. This seems
overly complicated. We have more memory these days than when we started
building Stingray. In many cases,
especially for strings, it makes more sense to keep them in memory all
the time, so we can switch between them easily.

Just switching on platform isn't enough. Mobile devices range
from very low-end to at least mid-end. Rather than having .ios and .android
properties, we might want .low-quality and .high-quality and
select which one to use based on the actual capabilities of the
hardware.

Making editors work well with the property system has been surprisingly
complicated. For example, when the editor runs on Windows,
what should it show if there is a .win32 specialization of a
resource -- the default version or the .win32 one? How would you
edit a .ps4 resource when those are normally stripped out of the
Windows runtime?

We used to have this wonky think where you could sort of
cross-compile the resources and say that "I want to run on Windows,
but as if I was running on PS4. But to be honest, that system
never really worked that well and in the new editor we have
gotten rid of it.

Interestingly, out of all these problems, it is the first one -- the
most stupid one -- that is the main impetus for change.

The New System

The new system has several parts. First, we decided that for systems
that deal with localization a lot, such as strings and sounds it makes
sense to have the system actually be aware of localization. That way,
we can provide the best possible experience.

All the languages are stored in the same file and to switch language
you just call Localizer.set_language("fr"). We keep all the different
languages in memory at all times. Even for a game with ridiculous
amounts of text this still doesn't use much memory and it means we
can hot-swap languages instantly.

This is a nice approach, but it doesn't work for all resources. We
don't want to add this deep kind of integration to resources that are
normally not localized, such as .unit and .texture. Still, there
sometimes is a need to localize such resources. For example, a .texture
might have text in it that needs to be localized. We may need a low-poly
version of a .unit for a less capable platform. Or a less gory version
of an animation for countries with stricter age ratings.

To make things easier for the editor we decided to ditch the property
system all together, and instead go for a substitution strategy. There
are no special magical parts of a resource's path -- it is just a name
and a type. But if you want to, you can say to the engine that all
instances of a certain resource should be replaced with another resource:

trees/larch_03.unit → trees/larch_03_ps4.unit

Note here that there is nothing special or magical about the
trees/larch_03_ps4.unit. There is no problem with displaying it on Windows.
You just edit it in the editor, like any other unit. However, when you play
the game -- any time a trees/larch_03.unit is requested by the engine, a
trees/larch_03_ps4.unit is substituted. So if you have authored a level
full of larch_03 units, when the override above is in place, you will instead
see larch_03_ps4 units.

There are many ways for this scheme to go wrong. The gameplay script might
expect to find a certain node branch_43 in the unit -- a node that exists in
larch_03.unit, but not in larch_03_ps4.unit and this may lead to unexpected
behavior. The same problem existed in the old property system. We don't try to
do anything special about this, because it is impossible. In the end, it is only
the gameplay script that can know what it means for two things to be similar
enough to be used interchangeably. Anyone working with localized resources just
has to be careful not to break things.

Note that this is a much more powerful system than the old property system.
Any resource can be set to override any other -- we are not restricted to work
within the strict naming scheme required by the property system. Also, the
override is dynamic and can be determined at runtime. So it can be based on dynamic
properties, such as measured CPU or GPU performance -- or a user setting
for the amount of gore they are comfortable with.

It can even be used for completely different things than localization or platform
specific resources -- such as replacing the units in a level for a night-time
or psychedelic version of the same level. And I'm sure our users will find many
other ways of (ab)using this mechanism.

But this dynamic system is not quite enough to do everything we want to do.

First, since the override is dynamic and only happens at runtime, our packaging
system can't be aware of it. Normally, our packaging system figures out all
resource dependencies automatically. So when you say that you want a package with
the forest level, the packaging system will automatically pull in the
larch_03 unit that is used in that level, any textures used by that unit, etc.
But since the packaging system can't know that at runtime you will replace
larch_03 with larch_03_ps4, it doesn't know that larch_03_ps4 and its
dependencies should go into the package as well.

You could add larch_03_ps4 to the package manually, since you know it will be
used. That might work if you only have one or two overrides. However,
even with a very small amount of overrides micromanaging packages in this way
becomes incredibly tedious and error prone.

Second, we don't want to burden the packages with resources that will never
be used. If we are making a game for digital distribution on iOS or Android
we don't want to include large PS4-only resources in that game.

So we need a static override mechanism that is known by the package manager
to make sure it includes and excludes the right resources. The simplest thing
would be a big file that just listed all the overrides. For example, to
override larch_03 on PS4 we would write something like:

This would work, but could again get pretty tedious if there are a lot of overrides.
It would be nice with something that was a bit more automatic.

Since our users are already used to using name suffixes such as .fr and .ps4
for localization, we decided to build on the same mechanism -- creating overrides
automatically based on suffix rules:

resource_overrides = [
{suffix = "_ps4", platforms = ["ps4"]}
]

This rule says that when we are compiling for the platform PS4, if we find a resource
that has the same name as another resource, but with the added suffix _ps4, that
resource will automatically be registered as an override for that resource:

This defines the _fr suffix for French localization. A 4K suffix _4k for high-quality
versions of resources suitable for 4K monitors. And a _noblood suffix that selects
resources without blood and gore.

The flags can be set at compile time with:

--compile --resource-flag-true 4K

This means that we are compiling a 4K version of the game, so when bundling only the
4K resources will be included and the other versions will be stripped out. Just as if
we were compiling for a specific platform.

But we can also choose to resolve the flags at runtime:

--compile --resource-flag-runtime noblood

With this setting, both the regular resource and the _noblood resource will be included
in the package and loaded into memory. And we can hot swap between them with:

Application.set_resource_flag("noblood", true)

I have not decided yet whether in addition to these two alternatives we should also
have an option that resolves at package load time. I.e., both variants of the resource
would be included on disk, but only one of them would be loaded into memory and if you
wanted to switch resource you would have to unload the package and load it back into memory
again.

I can see some use cases for this, but on the other hand adding more options complicates
the system and I like to keep things as simple as possible.

A nice thing about this suffix mapping is that it can be configured to be backwards
compatible with the old property system:

Whenever we change something in Stingray we try to make it more flexible and data-driven,
while at the same time ensuring that the most common cases are still easy to work with.
This rewrite of the localization is a good example:

It fixes the problem with periods in file names. Periods are now only an issue if you have
made an explicit suffix mapping that matches them.

We can switch language (or any other resource setting) at runtime.

The new system is more flexible -- it doesn't just handle localization and platform
specific resources, we can set up whatever resource categories we want.
And we can even dynamically override individual resources.

The editor no longer needs to do anything special to deal with the concept of "properties".
Resources that are used to override other resources can be edited in the editor just
like any other resource.

And the system can easily be configured to be backwards compatible with the old
localization system.

I still feel slightly queasy about using name matching to drive parts of this system.
Name matching is a practice that can go horribly wrong. But in this case, since the
name matching is completely user controlled I think it makes a good compromise between
purity and usability.

Tuesday, August 16, 2016

The rendering pipe in Stingray is completely data-driven, meaning that everything from which GPU buffers (render targets etc) that are needed to compose the final rendered frame to the actual flow of the frames is described in the render_config file - a human readable json file. I have covered this in various presentations [1,2] over the years so I won’t be going into more details about it in this blog post, instead I’d like to focus on a new feature that we are rolling out in Stingray v1.5 - Render Config Extensions.

As Stingray is growing to cater to more industries than game development we see lots of feature requests that don’t necessarily fit in with our ideas of what should go into the default rendering pipe that we ship with Stingray.
This has made it apparent that we need a way of doing deep integrations of new rendering features without having to duplicate the entire render_config file.

This is where the render_config_extension files comes into play. A render_config_extension is very similar to the main render_config except that instead of having to describe the entire rendering pipe it appends and inserts different json blocks into the main render_config.

When the engine starts the boot ini-file specifies what render_config to use as well as an array of render_config_extensions to load when setting up the renderer.

The array describes the initialization order of the extensions which makes it possible for the project author to control how the different extensions stacks on top of each other. It also makes it possible to build extensions that depends on other extensions.

A render_config_extension consists of two root blocks: append and insert_at:

append

The append block is used for everything that is order independent and allows you to append data to the following root blocks of the main render_config:

insert_at

The insert_at block allows you to insert layers and modifiers into already existing layer_configurations and resource_generators, either belonging to the main render_config file or a render_config_extension listed earlier in the render_config_extensions array of engine boot ini-file.

The object names under the insert_at block refers to extension_insertion_points listed in the main render_config file or one of the previously loaded render_config_extension files. We’ve chosen not to allow extensions to inject anywhere they like (using line numbers or similar crazyness), instead we expose a bunch of extension “hooks” at various places in the main render_config file. By doing this we hope to have a somewhat better chance of not breaking existing extensions as we continue to develop and potentially do bigger refactorings of the default render_config file.

Future work

This extension mechanism is somewhat of an experiment and we might need to rethink parts of it in a later version of Stingray. We’ve briefly discussed a potential need for dealing with versioning, i.e. allowing extensions to explicitly list what versions of Stingray they are compatible with (and maybe also allow extensions to have deviating implementations depending on version). Some kind of enforced name spacing and more aggressive validation to avoid name collisions have also been debated.

In the end we decided to ignore these potential problems for now and instead push for getting a first version out in 1.5 to unblock plugin developers and internal teams wanting to do efficient “deep” integrations of various rendering features. Hopefully we won’t regret this decision too much later on. ;)

Note: This article isn't an introduction to volumetric cloud rendering but more of a small log of the development process of the plugin. Also, you can try it out for yourself or look at the code by downloading the Stingray plugin. Feel free to contribute!

I was really impressed at the shapes that can be created from such simple building blocks. While you can definitely see cases where some tiling occurs, it’s not as bad as you would imagine. Once the textures are generated the tough part is to find the right sampling spaces and scales at which they should be sampled in the atmosphere. It's difficult to get a good balance between tiling artifacts vs getting enough high frequency details for the clouds. On top of that cache hits are greatly affected by the sampling scale used so it's another factor to consider.

Finding good sampling scales for all of these textures and choosing by how much the extrusion texture should affect the low frequency clouds is very time consuming. With some time you eventually build intuition for what will look good in most scenarios but it’s definitely a difficult part of the process.

We also generate some curl noise which is used to perturb and animate the clouds slightly. I've found that adding noise to the sampling position also reduces linear filtering artifacts that can arise when ray marching these low resolution 3d textures.

One thing that often bothered me is the oddly shaped cumulus clouds that can arise from tilled 3d noise. Those cases are particularly noticeable for distant clouds. Adding extra cloud coverage for lower altitude sampling positions minimizes this artifact.

Raymarching the volume at full resolution is too expensive even for high end graphics cards. So as suggested by Real-time Volumetric Cloudscapes of Horizon: Zero Dawn we reconstruct a full frame over 16 frames. I've found that to retain enough high frequency details of the clouds, we need a fairly high number of samples. We are currently using 256 steps when raymarching. We offset the starting position of the ray by a 4x4 Bayer matrix pattern to reduce banding artifacts that might appear due to undersampling. Mikkel Gjoel shared some great tips for banding reduction while presenting The Rendering Of Inside and encouraged the use of blue noise to remove banding patterns. While this gives better results there is a nice advantage of using a 4x4 pattern here: since we are rendering interleaved pixels it means that when rendering one frame we are rendering all pixels with the same Bayer offset. This yields a significant improvement in cache coherency compared to using a random noise offset per pixel. We also use an animated offset which allows us to gather a few extra samples through time. We use a 1d Halton sequence of 8 values and instead of using 100% of the 16ᵗʰ frame we use something like 75% to absorb the Halton samples.

To re-project the cloud volume we try to find a good approximation of the cloud's world position. While raymarching we track a weighted sum of the absorption position and generate a motion vector from it.

This allows us to reproject clouds with some degree of accuracy. Since we build one full resolution frame every 16ᵗʰ frame it’s important to track the samples as precisely as possible. This is especially true when the clouds are animated. Finding the right number of temporal samples you want to integrate over time is a compromise between getting a smoother signal for trackable pixels vs having a more noisy signal for invalidated pixels.

Lighting

To light the volume we use the "Beer-Powder" term described by Real-time Volumetric Cloudscapes of Horizon: Zero Dawn. It's a nice model since it simulates some of the out-scattering that occurs at the edges of the clouds. We discovered early on that it was going to be difficult to find terms that looked good for both close and distant clouds. So (for now anyways) a lot of the scattering and extinction coefficients are view dependent. This proved to be a useful way of building intuition for how each term affects the lighting of the clouds.

The ambient function described takes three parameters: sampling altitude, bottom color and top color. Instead of using constant values, we calculate these values by sampling the atmosphere at a few key locations. This means our ambient term is dynamic and will reflect the current state of the atmosphere. We use two pairs of samples perpendicular to the sun vector and average them to get the bottom and top ambient colors respectively.

Since we already calculated an approximate absorption position for the reprojection, we use this position to change the absorption color based on the absorption altitude.

Finally, we can reduce the alpha term by a constant amount to skew the absorption color towards the overlayed atmospheric color. By default this is disabled but it can be interesting to create some very hazy skyscapes. If this hack is used, it's important to protect the scattering highlight colors somewhat.

Animation

The animation of the clouds consists of a 2d wind vector, a vertical draft amount and a weather system.

We dynamically calculate a 512x512 weather map which consists of 5 octaves of animated Perlin noise. We remap the noise value differently for each rgb component. This weather map is then sampled during the raymarch to update the coverage, cloud type and wetness terms of the current cloud sample. Right now we resample this weather term for each ray step but a possible optimization would be to sample the weather data and the start and end of the ray positions and interpolate these values at each step. All of the weather terms come in sunny/stormy pairs so that we can lerp them based in a probability of rain percentage. This allows the weather system to have storms coming in and out.

The wetness term is used to update a structure of terms which defines how the clouds look based on how much humidity they carry. This is a very expensive lerp which happens per ray march and should be reduced to the bare minimum (the raymarch is instruction bound so each removed lerp is a big win optimization wise). But for the current exploratory phase it’s proving useful to be able to tweak a lot of these terms individually.

Future work

I think that as hardware gets more powerful realtime cloudscape solutions will be used more and more. There is tons of work left to do in this area. It is absolutely fascinating, challenging and beautiful. I am personally interested in improving the sense of scale the rendered clouds can have. To do so, I feel that the key is to reveal more and more of the high frequency details that shape the clouds. I think smaller cloud features are key to put in perspective the larger cloud features around them. But extracting higher frequency details usually comes at the cost of increasing the sampling rate.

We also need to think of how to handle shadows and reflections. We've done some quick tests by updating a 512x512 opacity shadow map which seemed to work ok. Since it is not a view frustum dependent term we can absorb the cost of updating the map over a much longer period of time than 16 frames. Also, we could generate this map by taking fewer samples in a coarser representation of the clouds. The same approach would work for generating a global specular cubemap.

I hope we continue to see more awesome presentations at GDC and Siggraph in the coming years regarding this topic!

Friday, April 1, 2016

The Poolroom

Figure 1 : Poolroom Pool Table

The poolroom was my first attempt at creating a truly rich
environmental experience with Stingray.
Most architectural visualization scenes you see are
antiseptically clean and uncomfortably modern. I wanted to break away from that.
I wanted an environment I would feel at home with, not one that a movie star
would buy for sheer resale value to another movie star. I also wanted the
challenge of working with natural and texturally rich materials. Not white on
white, as is generally the case.

Figure : Poolroom Clock

To this end, I started looking for cozy but luxurious spaces
on google and eventually came across a nice reference photo I could work with. Warm
rich woods, lots of games, a bar, and well... those all speak to me. For better
or worse, I felt this room was one I would personally feel comfortable in. So I
took on the challenge of re-creating that environment in 3D inside Stingray.

The challenges

The poolroom gave me some major challenges. Some I knew
would be trouble from the start, but some I didn’t realize until I started
rendering lightmaps. Most of my difficulties came down to handling materials
properly.

Figure 3 : Poolroom Bar

Coming to grips with physically based shaders

In addition to being my first complete Arch-Viz scene in
Stingray, this was also my first real stab at using physically based shading
(PBS). Although physically based shading is similar in many regards to traditional
texturing, it has its own set of tricks and gotchas. I actually had to re-do
the scenes materials more than once as I learned the proper way to do things.

For example, my scene was
predominantly dark woods. With dark woods, you really have to be sure you get
the albedo material in the correct luminosity range or you end up with
difficulties when you light the scene. In my first attempts, I found my light
being just eaten up by the darkness of the wood’s color map. I kept cranking up
the light Intensities, but this would flood the scene and lead to harsh and
broken light bakes.

Figure 4 : Arcade Game
/p>

Eventually, once I understood the effect of the color map’s
luminosity and got the values in line, I started getting great results with
normalized light intensities. My lighting began responding favorably with deep,
rich lightmap bakes. When you get the physical properties of the materials
right, Stingray’s light baker is both fast and very good. But I can’t stress
enough: with PBS, you must ensure that your luminosity values are accurate.

Reference photo was HDR

When I was building out the scene and trying to mimic the
reference photo’s lighting, I realized that the original image was made using
some high-dynamic range techniques. I couldn’t seem to get the same level of
exposure and visual detail in the shadowed areas of my scene.

Figure 5 : Before Ambient Fills

Figure 6 : After Ambient Fills

Because of this, I had to do
some pretty fun trickery with my scene lighting. In the end, I got it by placing
some subtle, non-shadow casting lights in key areas to bring up the brightness
a little in those areas.

Figure 6 : Soft Controlled Lighting

All in all, the scene took a lot of lighting work to get
just right. I have to say that I was very happy with how closely I was able to
match the lighting, given that the original photo was HDR.

Lived-in but not dirty

The last big challenge was also related to materials. I had
to find that fine balance of a room that is clean and tidy but also obviously
lived-in. So often I find Arch-Viz work feels unnaturally smooth and clean, which
can destroy the belief of the space. I really wanted my scene to
break through the uncanny valley and feel real.

I handled this mostly by creating some very simple grunge
maps, and applying them to the roughness maps using a simple custom shader.
This was easy to build in Stingray’s node-based shader graph:

I have this shader set up so I can control the tiling of the
color map, normals and other textures. The grunge map, on the other hand, is
sampled using UV coordinates from the lightmap channel. This helps to hide the
tiling over large areas like the walls, because the grunge value that gets multiplied
in to the roughness is always different each time the other textures repeat.

Balancing the grunge properly was the biggest challenge
here, but in the end, some still shots even get me doing a double-take. When
that happens, I know I’m doing well. I also posted progress along the way on my
Facebook page — when I had friends saying, “whoa, when can I come visit?” I
knew I was nailing it.

3D modeling

Figure 9 : Record Player Model in Maya LT

I don’t have much that’s
special to say about the 3D modeling process. I simply modeled all my assets
the same way anyone would. Attention to detail is really the trick, and making
sure that I created hand-made lightmap UVs for every object was critical to
ensure the best light baking. Otherwise it was just simple modeling.

Figure 10 : Poolroom Model in MayaLT

One thing to note, however, is that I only used 3D tools
that came with the Stingray package, except for Substance Designer and a little
Photoshop. I did the entire scene’s modeling in MayaLT. Sometimes people think
cheap is not good, but I believe this proves otherwise. MayaLT is incredible. I
am super happy with the results and speed at which you can work with it. Best
of all, it’s part of the package, so no additional costs.

Material design

Laying out the materials in
the scene was pretty straightforward for the most part. At one point, I
experimented with using more species of wood, but the different parts of the room
started to feel disconnected. I started removing materials from my list, and
eventually when I ended up with only a small handful the room came together as
you see it.

Figure 11 : Record Player Material Design in Substance

I guess something else I should mention is performance
shaders. Stingray comes with a great, flexible standard shader, but I wanted to
eke out every little bit of performance I could on this scene while keeping the
quality very high. Without much trouble, I created a library of my own purpose-built
shaders (like the one mentioned earlier). I used these for various tasks. Simple
colors, RMA (roughness-metallic-ambient occlusion), RMA-tiling shaders and a
few others came together really quickly. From this handful of shaders, I was
able to increase performance while simplifying my design process. I find it
comforting how Stingray deals with shaders… it is just very easy to iterate and
save a version. Much better usability than other systems I have tried.

Figure 12 : Shader Library

Fun stuff

Well, most game dev is hard
work, the fun is at the end when you get to finally relax and see your efforts
paid off. But there were definitely some really fun parts of making the
poolroom.

One was the clock. It’s a
small, almost easter-egg kind of thing, but I programmed the clock fully. Meaning,
its hands move, the pendulum swings, and it also rings the hour. So if you are
exploring the poolroom and it happens to be when the hour changes in your
system clock, the clock in the game rings the hour for you. So two o’clock
rings two times, four o’clock rings four
times, etc. The half-hour always strikes once. I modeled the clock after one
that my father gave me, so I put some extra love into it. It is basically
exactly the clock that hangs in my living room.

Figure 13 : Clock Model in MayaLT

Figure 14 : Clock Model in Stingray

I also gave the record player
some extra attention, because my good friend Mathew Harwood was kind enough to
do all the audio for the project. I felt the music really set the scene, and he
even worked on it over my twitch stream so we could get feedback from some
people who were watching. So yeah, press +
or - in the game to start and stop
the record player, complete with animated tone arm. Nothing super crazy, just a
nice little touch.

Figure 15 : Record Player in Stingray

Community effort

One thing I found really neat about this project was that I
streamed the entire creation process on my Twitch channel. I have never
streamed much before this project, but it made the process much more fun. I had
people to talk with, and often my viewers were helpful to me in suggesting
ideas and noticing things I had not noticed. It was very collaborative and a
great learning exercise for me and for my viewers. We got to learn from each
other, which is the dream!

For example, the record player likely would not have been
done to the level I did it had one of my viewers not pushed me to make a really
detailed player. Because of this push, it ended up being a focus of the level,
and even has some animation and basic controls a user can interact with.