Wednesday, February 1, 2017

Stingray Renderer Walkthrough #2: Resources & Resource Contexts

Stingray Renderer Walkthrough #2: Resources & Resource Contexts

Render Resources

Before any rendering can happen we need a way to reason about GPU resources. Since we want all graphics API specific code to stay isolated we need some kind of abstraction on the engine side, for that we have an interface called RenderDevice. All calls to graphics APIs like D3D, OGL, GNM, Metal, etc. stays behind this interface. We will be covering the RenderDevice in a later post so for now just know that it is there.

We want to have a graphics API agnostic representation for a bunch of different types of resources and we need to link these representations to their counterparts on the RenderDevice side. This linking is handled through a POD-struct called RenderResource:

Any engine resource that also needs a representation on the RenderDevice side inherits from this struct. It contains a single member render_resource_handle which is used to lookup the correct graphics API specific representation in the RenderDevice.

The most significant 8 bits of render_resource_handle holds the type enum, the lower 24 bits is simply an index into an array for that specific resource type inside the RenderDevice.

Various Render Resources

Let’s take a look at the different render resource that can be found in Stingray:

Texture - A regular texture, this object wraps all various types of different texture layouts such as 2D, Cube, 3D.

RenderTarget - Basically the same as Texture but writable from the GPU.

DependentRenderTarget - Similar to RenderTarget but with logics for inheriting properties from another RenderTarget. This is used for creating render targets that needs to be reallocated when the output window (swap chain) is being resized.

BackBufferWrapper - Special type of RenderTarget created inside the RenderDevice as part of the swap chain creation. Almost all render targets are explicitly created by the user, this is the only exception as the back buffer associated with the swap chain is typically created together with the swap chain.

ShaderConstantBuffer - Shader constant buffers designed for explicit update and sharing between multiple shaders, mainly used for “view-global” state.

VertexStream - A regular Vertex Buffer.

VertexDeclaration - Describes the contents of one or many VertexStreams.

IndexStream - A regular Index Buffer.

RawBuffer - A linear memory buffer, can be setup for GPU writing through an UAV (Unordered Access View).

Shader - For now just think of this as something containing everything needed to build a full pipeline state object (PSO). Basically a wrapper over a number of shaders, render states, sampler states etc. I will cover the shader system in a later post.

Most of the above resources have a few things in common:

They describe a buffer either populated by the CPU or by the GPU

CPU populated buffers has a validity field describing its update frequency:

STATIC - The buffer is immutable and won’t change after creation, typically most buffers coming from DCC assets are STATIC.

UPDATABLE - The buffer can be updated but changes less than once per frame, e.g: UI elements, post processing geometry and similar.

DYNAMIC - The buffer frequently changes, at least once per frame but potentially many times in a single frame e.g: particle systems.

They have enough data for creating a graphics API specific representation inside the RenderDevice, i.e they know about strides, sizes, view requirements (e.g should an UAV be created or not), etc.

Render Resource Context

With the RenderResource concept sorted, we’ll go through the interface for creating and destroying the RenderDevice representation of the resources. That interface is called RenderResourceContext (RRC).

We want resource creation to be thread safe and while the RenderResourceContext in itself isn’t, we can achieve free threading by allowing the user to create any number of RRC’s they want, and as long as they don’t touch the same RRC from multiple threads everything will be fine.

Similar to many other rendering systems in Stingray the RRC is basically just a small helper class wrapping an abstract “command buffer”. On this command buffer we put what we call “packages” describing everything that is needed for creating/destroying RenderResource objects. These packages have variable length depending on what kind of object they represent. In addition to that the RRC can also hold platform specific allocators that allow allocating/deallocating GPU mapped memory directly, avoiding any additional memory shuffling in the RenderDevice. This kind of mechanism allows for streaming e.g textures and other immutable buffers directly into GPU memory on platforms that provides that kind of low-level control.

Handing it over directly to the RenderDevice requires the caller to be on the controller thread for rendering as RenderDevice::dispatch() isn’t thread safe. If the caller is on any other thread (like e.g. one of the worker threads or the resource streaming thread) RenderInterface::dispatch() should be used instead. We will cover the RenderInterface in a later post so for now just think of it as a way of piping data into the renderer from an arbitrary thread.

Wrap up

The main reason of having the RenderResourceContext concept instead of exposing allocate()/deallocate() functions directly in the RenderDevice/RenderInterface interfaces is for efficiency. We have a need for allocating and deallocating lots of resources, sometimes in parallel from multiple threads. Decoupling the interface for doing so makes it easy to schedule when in the frame the actual RenderDevice representations gets created, it also makes the code easier to maintain as we don’t have to worry about thread-safety of the RenderResourceContext.

In the next post we will discuss the RenderJobs and RenderContexts which are the two main building blocks for creating and scheduling draw calls and state changes.

One "detail" I am curious about, is how you give RenderResource::render_resource_handle a correct value. One option I see, is that it only receives the correct index/value while the RenderResourceContext is being dispatched. But that would mean the RenderResource cannot be used to build commands on a RenderContext before the RenderResourceContext's dispatch is finished, which would hinder parallelism. Alternatively, each RenderResourceContext gets ranges of indices it can use for each type of resource. But that also seems inconvenient. Maybe there is an elegant solution I am missing here? Any thoughts would be greatly appreciated!

I know it's a bit late, but I've been thinking about this quite a bit myself. One of my personal solutions to the problem you state would be to return "local" handles per render resource; that is, each RenderResourceContext would hold its own handles per resource type, starting from zero. Each resource of the same type that you allocate from that particular context would increment the handle for that type, and then when the context gets dispatched each of the local handles (the ones starting from zero) get converted into the "actual" handles inside the render device. For example, if you allocated 3 textures from within a single context, the first texture would be render_resource_handle 0, the 2nd one would be 1, and the 3rd one 2, respectively. Then say the render device already has 10 textures in memory: when the context gets dispatched, the 0, 1, and 2 get converted into 11, 12, and 13. This lets you share render resources between objects so long as they are created with the same context; because after dispatch, the local handles are meaningless. Again, I understand that was probably worded pretty poorly, but I hope I got the gist across. The design still isn't perfect; though; there are a couple of pressing issues that need to be addressed before it's perfect.