So second thought was to create module for every node, and have it output relevant data as string as well as render it, so cumbersome, bleh again...

Then last thought, oh I got dx11 source code (since I wrote it), so i should be able to traverse the graph and store data instead of render (and then serialize as any format like json, no xml please ;)

Issues:

I don't want to store vertices/indices, takes too much space. I already know what geometry I use, so it would be better to just tell how to generate it again.

Most 3d formats suck for batching, in my case I can have the same geometry sent 10 times to shader, most 3d software would replicate geometry (come on, please start to do proper format one day), Instead it would be much easier to tell geometry-> transform list.

So for the moment once geometry is created I got no way to know how it was built.

So I simply added this in geometry provider

Code Snippet

string PrimitiveType { get; set; }

object Tag { get; set; }

Then for each type of geometry I create a descriptor, here is the base;

Code Snippet

publicabstractclassAbstractPrimitiveDescriptor

{

publicabstractstring PrimitiveType { get; }

publicabstractvoid Initialize(Dictionary<string, object> properties);

publicabstractIDX11Geometry GetGeometry(DX11RenderContext context);

}

Then here is sphere:

Code Snippet

publicclassSphere : AbstractPrimitiveDescriptor

{

publicfloat Radius { get; set; }

publicint ResX { get; set; }

publicint ResY { get; set; }

publicfloat CyclesX { get; set; }

publicfloat CyclesY { get; set; }

publicoverridestring PrimitiveType { get { return"Sphere"; } }

publicoverridevoid Initialize(Dictionary<string, object> properties)

{

this.Radius = (float)properties["Radius"];

this.ResX = (int)properties["ResX"];

this.ResY = (int)properties["ResY"];

this.CyclesX = (float)properties["CyclesX"];

this.CyclesY = (float)properties["CyclesY"];

}

publicoverrideIDX11Geometry GetGeometry(DX11RenderContext context)

{

return context.Primitives.Sphere(this);

}

}

That's it, instead of storing 1000 vertices i can just store 5 parameters instead, how cool is that?

When I create geometry, I just assign the descriptor to the geometry, so I can know at any time how things were created.

Now that's cool, but I need to grab geometry, so here is how it works.
Layer have RenderSettings which carry some info on how to render, so I simply added collector as render hint.

Code Snippet

publicenumeRenderHint { Forward, MRT, Shadow, Overlay, Collector }

Then settings carry:

Code Snippet

publicclassDX11ObjectGroup

{

public DX11ObjectGroup()

{

this.RenderObjects = newList<DX11RenderObject>();

}

publicstring ShaderName { get; set; }

publicList<DX11RenderObject> RenderObjects { get; set; }

}

publicclassDX11RenderObject

{

publicstring ObjectType { get; set; }

publicobject Descriptor { get; set; }

publicMatrix[] Transforms { get; set; }

}

So renderer only has to set it's flag to collector (tells shader that we don't want to render but collect information).

Shader then can just retrieve descriptor, and push it with list of associated transforms (huhu, batch friendly ;)

Little snippet for shader node, if render hint is set to collect it just appends object/transform instead of render.

Code Snippet

IDX11Geometry g = this.FGeometry[0][context];

if (g.Tag != null)

{

DX11RenderObject o = newDX11RenderObject();

o.ObjectType = g.PrimitiveType;

o.Descriptor = g.Tag;

o.Transforms = newMatrix[spmax];

for (int i = 0; i < this.spmax; i++)

{

o.Transforms[i] = this.mworld[i % this.mworldcount];

}

group.RenderObjects.Add(o);

settings.ObjectCollector.Add(group);

}

Object list is then serialized as Json Object, here is one part of it:

So next was just doing a Loader node for testing, pretty simple, retrieve descriptor, create geom, associate with transform and render. Please note that could do an importer for anything, it's not only 4v related (technically I got a use case for it, top secret ;)

The obvious great thing is it's totally transparent, I only need to enable collect, and I can retrieve whole scene, no need for any fancy stuff, and works for any part I have already done as well, no mod needed.

Here are few example of it:

That's it, now of course will need improvements, but pretty excited about possibilities. Having source code for render system is quite invaluable, without it would have done the dirty module way, brrr....scary!!

Sunday, 26 May 2013

Most of post processing in DirectX11 can now be done in Forward (HBAO / Depth Of Field use depth only), but of course as soon as you include shading into the mix, you need Normals and a bit of Extra Data.

So you need some MRT setup, for my lights i use only 2 targets:

Code Snippet

struct PS_OUT

{

float4 albedo :SV_Target0;

float4 ns : SV_Target1;

};

albedo.xyz is color, and alpha channel is used for roughness
ns.xyz is normals (view space), ns.a is reflectivity

I could add a 3rd target for specular albedo, but never had the use for it, maybe I'll add it one day (could use the 4th target for some shadow bits as well)

Ok so now we our shiny normal target filled up with fancy geometry, we need to add some shading into it.

Directional light is full screen, not many ways to cull that (except using some stencil culling if possible)

Point lights you have plenty of techniques, I remember doing Light volumes in DX9 a few years ago, it's pretty easy.

Of course now you have shiny new technique, using compute tile lights (I based my implementation from Andrew Lauritzen, which also have some nice techniques using Geometry Shader if your device is not SM5.0 capable)

So on the list of techniques:

Compute Tiles : Send a structured buffer, cull lights against tiles with mini frustrum and only process lights that pass the test

Clip Quad (low) : Some devices don't support raw/structured buffers, so instead you copy your lights into a Vertex Buffer and do more or less the same (you change structure input instead of doing lookup)

Here is how VS input looks for Clip Quad:

Code Snippet

StructuredBuffer<PointLight> LightBuffer : LIGHTBUFFER;

struct vsInput

{

uint iv : SV_VertexID;

};

And the Vertex Buffer Version

Code Snippet

struct vsInput

{

float3 lightpos : LIGHTPOSITION;

float attenuationBegin : LIGHTATTENUATIONBEGIN;

float3 color : LIGHTCOLOR;

float attenuationEnd : LIGHTATTENUATIONEND;

};

So both techniques are mostly equivalent, but the Structured Buffer one is quite nice in case you have 10.1 feature set (since you can use Compute Shader to move you lights for example)

In Case of VertexBuffer, you can still use Stream Output to animate lights in GPU, which is also fine, but adds a bit of extra coding around.

Please of course note that Stencil Test can also be used for GS techniques.

Now we have a decently efficient way to draw multiple lights, there's still one part missing, light equations ))

So now using a few brdf (depending on my scene):

Phong (never using it)

Oren Nayar (I really like it for splines)

Cook Torrance (for more or less all the rest)

Ward (used it before, but now i mostly use the 2 above)

So one option is to use Techniques, but in that cases it implies a lot of code duplication eg: not too nice.

That leaves 2 other options:

Shader Linkage

Shader Macros

Unless you have Feature Level 11, you stuck with macros, since you can use dynamic link.

It's pretty simple, you use defines and #ifdef / #if , then you specify defines when you compile shader, so basically you compile one shader instance per brdf.

Code Snippet

#define LIGHT_PHONG 0

#define LIGHT_WARD 1

#define LIGHT_COOKTORRANCE 2

#define LIGHT_OREN_SIMPLE 3

#define LIGHT_OREN_COMPLEX 4

//Override macros on compile

#ifndef LIGHT_TYPE

#define LIGHT_TYPE LIGHT_COOKTORRANCE

#endif

/* Here we can't use interfaces in case we haven't got sm5 support, so default to 0 */

#ifndef USE_INTERFACE

#define USE_INTERFACE 0

#endif

#if USE_INTERFACE == 1

#include "Brdf.fxh"

#else

#include "Brdf_Funcs.fxh"

#endif

So when you compile shader you also send the define LIGHT_TYPE, and then you switch shader depending on equation.

It's of course handy since you have a single code base, any improvement impacts on all brdf at once.

Now with shader model 5 you also have a nitfy feature called Shader Linkage, so now our code looks like:

Pretty cool no? And with shader reflection there's even no need to change any code, all is done automatically ))

Now of course one important part is : what are performances?

So please note that using shader linkage in that case implies a performance hit, which is quite non negligible if you not on high end hardware (I noticed it was quite significant on laptop, not as bad on desktop).

So shader linkage adds flexibility (very easy to avoid lot of permutation, or compile of shaders to handle all possibilities, but that comes as a cost). It's then up to you to abuse timestamp queries and decide if the hit is worth taking ;)

From the vertex id (simple GPU counter), we retrieve the cell position.

To normalize it (eg: fit in 0/1), we simply multiply by inverse resolution

We flip Y axis (for uv only)

Recenter positition (so our grid center is in 0,0 not in 0.5,0.5)

Done, that was so hard... :)

Now we need to generate our indices (since some vertices are shared).

We have few options here:

Set our Stream Output as triangle: This is not convenient at all, since grid geometry is quad based, so we'll need to replicate a lot of calculations, and flip triangle depending on vertex id, which is not very friendly.

Set our Stream Output as Quad: Instead of generating one triangle at a time in Vertex Shader, we will generate 2, which fits perfectly a cell expansion, and reuse calculations.

So obvious choice, quads

Here is our Vertex Shader output:

Code Snippet

struct vsOutputIndices

{

int3 t1 : TRIANGLE0;

int3 t2 : TRIANGLE1;

};

And the (extremely) hardcore Vertex Shader:

Code Snippet

vsOutputIndices VS_Indices(vsInput input)

{

vsOutputIndices o;

int j = input.iv / (colcount-1);

int i = input.iv % (colcount-1);

int rowlow = j * (colcount);

int rowup = (j+1) * (colcount);

o.t1 = int3(rowlow + i, rowup + i, rowlow + i + 1);

o.t2 = int3(rowlow + i + 1, rowup + i, rowup + i + 1);

return o;

}

Here we do more or less the same as generating positions, excluding we need to take care or row/column indices stride.

Here we are, generating a Grid without CPU.

Nice thing about it is since other geometries can also be interpreted as a parametric surface, we can easily modify shader to generate them (only shader to generate vertices needs modifications, index buffer builder is the same).