I commented on this post asking a further question but I think it would be better in it's own post and with the question expanded.I've done a few toy landscape render programs in the past but until now I always used a "traditional" OOP approach of making models, landsape segments, ui elements etc all implement an IDrawable interface something like this -

where the render function uses the d3d11 device to directly draw itsself.

After reading the post above I entirely understand how it might be better to instead make drawable objects produce a Mesh to draw which can then be drawn separately by the rendering system. It simplifies the code, and separates concerns much better and potentially makes the code much more reusable without lots of unnecessary abstraction.

Instead of directly drawing the things, the scene node items will instead put the data for a rendering operation onto a queue... Including mesh, camera and so on.

Now the problem I have is this -Inside my Model class the constructor would previously obtained references to the shaders it used and the textures and so on, and when the render function is called, it knows which shaders it's using and it knows what parameters those shaders require and can fill in a constant buffer and set it directly. Nothing outside of the Model class need have any knowledge of the shaders it uses, or what data they require. Which is a good thing.

Except that it doesn't fit with the new way of drawing things.

I'm thinking I could create a structure for the parameters for each "class" of shader and pass that into the render function as opaque data to give to the shader but that seems ugly. I could create a constant buffer for each class of shader and get my geometry classes to fill that in when they create the render operations but that feels ugly too.

How do people suggest that the part of code that wants to draw a model passes the shader parameters (and the material in general I guess) to the renderer in an elegant way? Although I want my code to be for D1D11 I''d like to keep an eye on doign things in a way that would make it easy to change in future so some form of abstraction is needed here even if it's only passing a handle to a constant buffer or something...

Does this even make any sense?

edit: To explain further, what makes this hard is that there is no real common structure. Each shader requires different numbers of vectors, values, textures etc. And the IRender interface shouldn't really have to know anything about the shader... I could pass a big map of name - value pairs for them all, but doing synamic memory allocation for a map in a renderer doesn't seem like a great idea

Hmh,If you have done this to this part I don't see reason why u couldn't do it. I would do it in little complex way using inheritance and base interface for managing certain effects.(I assume effects for d3d9, d3d11 and opengl are different, altough I've been using only d9 and have no idea how are the others working)Each mesh/node can keep a pointer to a shader, and a data which is gonna be passed. Ahh let me shot you:

When you create your model it gets a reference to a material structure (assuming C++ this could simply be a Material class pointer) and when the draw call is setup the pointer is placed into the structure.

Internally the material will hold all the data needed to work which isn't setup per-instance. (colour data, textures which are the same for all instances of the model etc).

So, for example, if you had a 'house' material all any object which uses it needs to do is indicate to the renderer it uses that material and thats all. It doesn't need to care about textures, parameters or any of that stuff. If you need a house which has a different texture then you'd create a different material (so you might have wood_house and brick_house materials) with the same shaders but different textures; internally you deal with not duplicating the shader setting and only change the textures between draw calls.

Per-instance data is a little tricker and it does depend somewhat on how your engine is setup.

For example if you have a simple update-draw loop on a single thread then you can get away with pass a chunk of memory along with the rest of the draw data which contains the per-instance data. This could be something as simple as an array of [handle, DataType] where 'handle' was something you asked for from the material for a per-instance parameter and 'DataType' could well be a union with a 'type' flag so that the same data could be used for float4 parameters and texture data. (or even a 'known' constant block layout if you wanted to expose it)

A solution for a threading setup would be more involved as you'd need a 'game' and 'renderer' side abstraction and then data gets passed across via a message queue from one to the other and stored locally.

The key points really is that the material needs to have a concept of 'per-instance' data and each mesh instance needs to store it's only copy of that data somehow. The handle based system is probably simplest.

So when the object it created it gets a handle to the 'foo' parameter and set the data to a pointer to the texture named 'cow'.

Once the draw call is created this per-instance data is added to it and in the renderer it is checked and the resulting data set to the correct variables for the draw.
(Internally the material knows how much per-instance data to expect; in the create this would be queried to size the array correctly on creation instead of the hard coded example above.)

(Note: This is only a rough sketch of the idea. It needs fleshing out but I hope you get the idea).

This is the essence of the data-driven design/architecture. But you can make your implementation far more robust than this pseudo-code-ish example goes; in fact, you must if you want to get the most out of the design. But you should have the general idea. There are a lot of great threads around here about this, including some of mine. I'm sure swiftcoder or Radikalizm can chime in with much better examples and information if they have time. If you need to work out any specifics from this just ask!

I think that even if you intend to use only DirectX 10, 11, one version of OpenGL, whatever (or program for only one platform, e.g., Windows) you should still try to design things to be platform and API -angnostic from the start. Use virtual/abstract interfaces to define all of these things and inherit from them to implement platform-specific bits of your engine/game. Then you can dynamically switch rendering back-ends at runtime and easily port your games to other platforms by only making a few changes in the engine. Keep your designs loosely coupled and each component focused only on the interface of the engine; nothing platform-specific.

Looking ATC's postm that is exactly what I want to do. My problem really is understanding how to define "Material" in a useful way.

I see "Material" is being a fairly dumb collection of data items. Which shaders to use, and the parameters to send to that shader.
The problem is that the parameters are different for each shader.

For example if I'm calling Render::Draw to draw a segment of a landscape the shader would require a heightmap, some textures, and texture blend map.
If I was calling Render::Draw to draw a 3D model it would require a texture, and perhaps some animation data to send to the shader.
If I was calling Render::Draw to draw a bit of water it would require textures, wave heights, and frequencies to send to the shader (perhaps...)

The problem is that Render::Draw shouldn't have to look which shader it is being told to use by the material and pick out the appropriate paramaters to set for that shader. My first thought was standard OO design, Material could have a virtual function called "setShaderParameters" which could do that for the Draw function, but that seems both ugly and highly inefficient.

My other thought is that material could contain a set of type/value pairs that say to "draw" set *this* value in *this* shader constant slot, then the draw function wouldn't have to know anything about what the data means.

I'm just going to point out that ATC's rough version is broken with regards to per-instance data.

IMesh containing mesh data is no good if you need that mesh data to vary per mesh instance; that data belongs at the per-model level, and then potentially per RenderOp level although instancing muddies that water somewhat, so that it can be varied without effecting the underlaying Mesh data which will be shared.

I'm just going to point out that ATC's rough version is broken with regards to per-instance data.

IMesh containing mesh data is no good if you need that mesh data to vary per mesh instance; that data belongs at the per-model level, and then potentially per RenderOp level although instancing muddies that water somewhat, so that it can be varied without effecting the underlaying Mesh data which will be shared.

In my engine we use a base interface "IRenderingOperation". That interface is implemented by the "RenderOp" and "InstanceOp" classes. The Renderer accepts IRenderingOperation instances to build sorted rendering batches; batches, in turn, can be sorted/layered as well. When the rendering queue is flushed the Renderer can easily determine if it's a plain RenderOp or an InstanceOp and make the appropriate API calls.

(I'm assuming you are only talking about the segment of my reply where I mention instancing and not the incorrect placement of instance data)

While I don't doubt that it works for you I, personally, am not happy with the concept of 'interfaces' and 'working things out' via them at that level.

Instinctively I feel that once you are getting down to this level simple command streams/tokens are a better fit with lower overhead and while there is still a degree of 'working out' it would be within the context of an 'if' statement rather than an interface differences.

Indeed it could be argued that you should have your batching fixed BEFORE any rendering ops are issued/created and AFTER vis-testing has been performed.

Just throwing this out there as I'm considering it as I write - the vis-testing system processes the renderable geometery, the output from this is a list of objects which are considered visable for the current camera. The list is then passed over and a 'renderable key' is generated for each instance (simple 64bit number would do), which would indicate the mesh and material required (among other data), and is placed into a list. This list is then sorted (bucket sort) and those buckets processed to generate a single draw call and the 'per instance' data which would have been requested from each object and placed in a buffer(s).

There is the underlaying problem of deciding how that data is instanced (vertex stream, direct cbuffer lookup, tbuffer lookup) and thus how to pull it from the object although the object probably shouldn't have to care how it ultimately gets to the shader. There is also a good arguement in there for making ALL render ops instancing ones so as to reduce the complexity of data setup - but that does create GPU overhead you might not want so...

Setting of the data for drawing itself could also be considered another command stream, which is a series of tokens indictating what parameters need to be setup.

In fact that could side step the setup issue above; if you assume each instance writes a compact set of data to the 'instance buffer' then internally the material/shaders command stream would know how to unpack that data (meta data in the material/shader to indicate which data goes where). As each instances MUST write the same data there is no fear about 'holes' and/or odd step sizes between data and the format it writes (order, data, etc) is fixed by your design so must be followed.

In fact you could flip it so that the per-instance buffer contains pointers to each instance's data buffer; then the material setup code could walk each instance and copy the data from that location into the correct destination location directly. (So it might unpack positional information into a cbuffer array, maybe some animation data into a tbuffer and some misc data into a vertex stream.) The source & desintation, as mentioned above, would be controlled by meta data and could be extracted from the material/shader at load time.

So...

Vis-test objects

process 'pass' list to extract 'render key'

bucket sort resulting list

for each bucket

Create drawcall data

for each instance in bucket get pointer to instance data and place into buffer

for each drawcall

execute material command list

copy per instance data from source buffers to destination

execute draw call

Grossly simplifed list but that's the basic idea...

Man, I wish I had a renderer up and running at home to test this with...

I might have done a poor job of expressing the idea in pseudo-code (I did it in all but 2mins) and maybe another poor job of explaining how my renderer currently works. I'm also using C#, so that's another key difference. But I'm pretty satisfied with how I have things setup now. In C# references are like "smart pointers" behind the scenes, so the strongly-typed interface design I'm using now is very efficient and the Renderer can piece things together quickly and pump out very complex scenes from simple code. It can accept new types of vertex data its never seen before, dynamically generate input elements and input layouts for D3D, handle "foreign" types of meshes and formats, and more. However, I think my "Shader" and "Material" implementations are a bit weak and need revisiting...

The way I currently approach things is that I have a special "InstanceData<T>" class to which per-instance data can be written and rapidly serialized. InstanceData is written to an "InstanceBuffer", which is typically bound to the GPU in vertex buffer slot 1. The "InstanceBuffer" class contains an "InstanceDataDescription" structure which describes the contents of the buffer for the engine; which in turn can tell Direct3D, OpenGL (or another API) what it is supposed to do with the information. A lot of the work you're talking about I defer to a "SceneManager"; e.g., determining what's on/off camera, finding LOD levels and mip-map levels, etc.

If you have any suggestions on how I could improve this system I'm all ears. I could especially use more work in my "Shader" and "Material" implementations as I said earlier. For instance, the right time/place to bind variable values to the underlying effect code the "Shader" class wraps. For instance, should a "RenderOp" contain shader variables or should they be bound to the shader prior to pushing an op to the Renderer?

I might have done a poor job of expressing the idea in pseudo-code (I did it in all but 2mins) and maybe another poor job of explaining how my renderer currently works. I'm also using C#, so that's another key difference. But I'm pretty satisfied with how I have things setup now. In C# references are like "smart pointers" behind the scenes, so the strongly-typed interface design I'm using now is very efficient and the Renderer can piece things together quickly and pump out very complex scenes from simple code. It can accept new types of vertex data its never seen before, dynamically generate input elements and input layouts for D3D, handle "foreign" types of meshes and formats, and more. However, I think my "Shader" and "Material" implementations are a bit weak and need revisiting...

Having read quite a few of your posts I'm aware that you are using C#, I'm also more than aware of how C# works so no need to explain that ;)

However anywhere where the runtime has to make a choice, based on virtual type information, isn't going to be 'very efficient' - this is not a commentry on C# or the .Net JIT as the same thing applies to doing a virtual call with C++; there is going to be overhead where the runtime has to figure out just where it is going to jump to and what it is going to execute (and with it comes the associated cache misses and the like).

This is not to say your design won't work and won't serve you well, but don't think that interfaces and virtual calls aren't costing you things; this is why AAA-class renderers avoid virtual calls and precook data and types as much as possible.Same goes with dealing with any type of data; it might seem like a good idea but the cost of doing it (both design and runtime wise) might not seem as worth it when you realise just how little you end up using such things at run time.(A good engine can of course deal with any vertex data layout it is passed, but the layout will be a known quanity at load time and fixed rather than having to care about dynamic setups. Even for runtime generated data this will end up fixed and won't have to be figured out on a per-frame basis.)

The point is you should be making very very few choices at the sharp end of a renderer; your data should be pre-cooked as much as you can, any choices should be simple (virtual calls != simple) and your data well laid out.

A lot of the work you're talking about I defer to a "SceneManager"; e.g., determining what's on/off camera, finding LOD levels and mip-map levels, etc.

I guess the split of work depends on how you design the system.The system above is the middle section of a sandwich.

The 'scene manager' does exist, it contains the scenes and maintains information about what goes into each scene, cameras etc but it doesn't do any direct work on the scene.It is queried by the renderer and for each scene (and thus each camera due to how our renderer works) it compiles a list of potentially visable objects which are then processed (more or less) as detailed above.The main difference to the above is we currently (and I don't understand why, the choice was made before I moved to the department) don't support instanced draw calls; so once the vis-list is completed we have a list of per-scene draw calls.These draw calls are then executed on the deferred contexts before being pushed to the main context for final rendering.

Scene Manager ===> Renderer ===> 'device' via contexts

So the renderer acts as a compiler or data and feeding system; the scene manager just simple holds the scene and makes no choices about what is sent where. (LOD levels etc are decided, currently, per-object vis-test; ie you pass the vis-test we do a distance test to see which LOD should be rendered and that is the draw call created.)

If you have any suggestions on how I could improve this system I'm all ears. I could especially use more work in my "Shader" and "Material" implementations as I said earlier. For instance, the right time/place to bind variable values to the underlying effect code the "Shader" class wraps. For instance, should a "RenderOp" contain shader variables or should they be bound to the shader prior to pushing an op to the Renderer?

Shaders should take there values from a combination of the material being used and runtime data; aside from sorting your draw calls shaders really shouldn't factor in that much and even then they are only useful to know about to build a 'sort key' to ensure you aren't doing things like 'A-B-A-B' drawing.

Material data itself should mostly be static, more than likely pushed into a buffer of some sort (Constant Block on D3D, OpenGL equivilant) and never touched; just rebound each time the draw call is about to happen.

Per-instance data is, as you've got, compiled into a buffer and attached at draw time. The buffer is going to be per-material (well, depending on the material it might have a few per-instance buffers for different techniques of drawing. For example if you have a shadow map and colour pass you might want two per-instance buffers. One for the colour pass only and one which is shared for both passes, depending on shader/data requirements but this would be configured via data and split as needed at runtime (see copying example from my earlier post for data routing).

+1 for phantom... Very informative and well thought-out post, for which I must thank you...

I'm aware that there is indeed a penalty for virtual calls in both C# and C++. I know about V-tables and the likes, and how it is resolved by the CLR in C#. However, these "hierarchies" of mine are very shallow and the penalty is very small. So small, in fact, that is one of the least costly things the engine does each frame. The engine takes a brief fraction of a second to load up and then it's full speed ahead after the first frame; with everything JIT'ed and ready to rock.

Pretty much everything you say is correct and I'm with you on it. For instance, having to reflect on vertex data each frame would be very inefficient. Therefore when a new "Mesh" instance is created it pre-processes everything for the underlying renderer... when I call, say, Mesh10.CreateFromMeshData( ... ), the HLSL and GLSL input bindings are automatically resolved, stored in an easily-accessible array that can be bound to the device / input assembler immediately. The process of reflecting on vertex data and figuring out how to describe it to the underlying API and GPU happens only when a mesh is loaded/created; or if the application suddenly wants to change to another, completely different rendering API (e.g., a software renderer or something 3rd party).

Thank you again for your post; there is much wisdom in it. I will be over-looking some of our code during this coming week and will try to apply your advice on things!