I'm reading interesting news with regard to Vulkan, they're adding some extensions to make easier legacy transitions, specialy the VK_KHR_push_descriptor and the VK_KHR_descriptor_update_template (the good thing is that these are core extensions, not vendor specific ones)

robmar wrote:Was DX12 really so bad we needed a completely new driver? ;(

More than bad, it is diferent. Up to DX11/GL the video card is modeled as a state machine whose parameters can be changed constantly (There is a state cache or something the like in the GL driver to change these the least possible) Multithreading would imply accessing this state machine and make sure there are no data races, which is complex on a state machine.

From VK/DX12//Metal on (i am assuming Apple's Metal is the like...) There are no state machines, but command buffers which would represent the flow of a render. There are pipelines, which are static objects that can be replaced quickly and that would represent a whole set of states of the previous GL/DX11 machine (ZRead/Write, alpha blending, polygon order and the like), framebuffers, which would represent the rendertargets, more or less, Descriptor sets, which would be the resources needed for each render/task in the form of Uniform Buffers and Textures, they can't be told to be exactly irrlicht materials, because part of their functionality is on the pipelines as well, buffers (vertices, indices and instance buffers) which are what they look like and more. These objects are read only most of the times, the only variations are done either at creation time, or outside the command recording, because every operation is recorded into command buffers, every rendering operation can be reduced to recording the appropriate commands such as bind this pipeline, these buffers, these descriptors, and issue a rendering command, and then playing back these buffers back. You can record several command buffers at a time, and can synchronize their execution, so some may finish before others, if necesary or they *might* run in parallel. The multithreading becomes easier because you won't alter the objects during the recording, and you can provide each thread their own command buffer, so there are no data races when recording them.

tl;dr From VK on, everything works on objects. The objects are used by the command buffers, which are recorded ahead of execution, and played back, and reused when needed or reset and recorded again. To use threading without mutually excluding the threads from the resources, every thread can be provided its own command buffer, and every object is pretty much read only. In the end, the command buffers may run in parallel all along, or they can be synced properly so some end before others and so on, and that's it.

The problem would be matching every element of Irrlicht into this scheme, but it is not a trivial problem. For instance: begin/end scene would be ambiguous, because now you'd need to tell where are you begining the scene, in which framebuffer. SetTransform would not work as expected because some tasks within a command buffer may end before others, changing a single parameter of a material now would involve changing from a pipeline to other or even creating on the fly the appropriate pipeline, something that would end causing a potential combinatory explosion of pipelines. The efficiency achieved comes with the cost of planning ahead what you will do. The good news though is that when you know how things are done, everything becomes a matter of programming outisde the video driver.

So i'd say yes, finish a proper GL driver and port to DX11 as much as possible, and then, create a whole different driver, or even engine (Irrlicht 2? command buffers could be emulated for GL/DX11 ) for Vulkan and don't bother with DX12, almost nobody seems eager to work with it, and if you see it, has logic, Windows Phones aren't precisely top notch lately, Android is going 100% with Vulkan and it is a really wide platform, and not only phones, Nintendo Switch uses VK, there is even an extension written by Nintendo and NVidia and it is in the VKSDK already, and to some extent, PS4 could use VK, besides, some of the OpenVR standards are trying to go for Vulkan, so, it looks like a promising choice. It has matured a lot during this year it has been out, so it is not a bad choice at all, it has future.

"While the use of a device context (ID3D11DeviceContext) is not thread-safe, the use of a Direct3D 11 device (ID3D11Device) is thread-safe. Because each ID3D11DeviceContext is single threaded, only one thread can call a ID3D11DeviceContext at a time. If multiple threads must access a single ID3D11DeviceContext, they must use some synchronization mechanism, such as critical sections, to synchronize access to that ID3D11DeviceContext."

Then it would seem from Microsoft's tech docs, that we can use one D11 device, and then use a DeviceContext for each thread.

Do the states not relate to each device context, rather than the device itself?

"Multithreading requires some form of synchronization. For example, if multiple threads that run in an application must access a single device context (ID3D11DeviceContext), that application must use some synchronization mechanism, such as critical sections, to synchronize access to that device context. This is because processing of the render commands (generally done on the GPU) and generating the render commands (generally done on the CPU through object creation, data loading, state changing, data processing) often use the same resources (textures, shaders, pipeline state, and so on). Organizing the work across multiple threads requires synchronization to prevent one thread from modifying or reading data that is being modified by another thread."

So which is it, I guess all of this is irrelevant unless there is speed increase.

AMD have "only" 8000 staff, but forum posts suggest they had limited multithreading support in their DX11 driver...

"Concurrent operations do not necessarily lead to better performance. For example, creating and loading a texture is typically limited by memory bandwidth. Attempting to create and load multiple textures might be no faster than doing one texture at a time, even if this leaves multiple CPU cores idle."

With the DX11 extensions, the CPU can directly access video memory, so then sharing the memory move across CPU cores using say TBB, would increase speed, or at least testing TBB here does so.

Okay so we can load textures faster with TBB, but how do we get greater FPS with parallel processing on the CPU?

I googled this topic and there is little if anything listing...

Given the nature of GPU rendering to FB, with depth sequence rendering..., how can rendering even be well shared?

We can do some preparatory work, textures, etc., but I'm not clear what can be shared.

Is there a reference textbook anywhere on multithreaded GPU programming, maybe from AMD?

IrrlichtBAW has multiple contexts for OpenGL, you can create and play around with buffers in a separate thread

Even if your buffer is persistently mapped (CPU can access GPU memory directly) you memory transfer is bound by PCIE bandwidth, throwing more cores at it will make it slower per core as the bandwidth has to be shared.One core is too fast to even write to RAM, the latency screws with it.

However two operations can happen concurrently so one can pull data from GPU with one core, push data with another and use a different one for driver interactions for actual rendering(this is why essentially i7s rarely give you an FPS boost in games over i5s unless you have a multi-GPU rig and the game is programmed to take advantage of it)

The only reason why you'd use more than 2 threads would be if serializing your code would create stuttering, sure streaming resources from the main thread might give you lower average frame times, higher bandwidth and less driver overhead, but whenever you load or save something you get a 500ms stall in the frame.

I think it is hard there is anything of the sort out there, only until now the topic of using multithreading and GPUs has been some sort of "forbidden magic" But i think the old June 2010 DXSDK had an example of multithreaded rendering on DX11... I just can't really tell what was it about though...

I've read that NVidia didn't have full multithreaded support in their driver for DX11, and jumped to DX12, where also there implementation was much poorer than AMD's, so we may have problems continuing along this line with DX11.

There is this Scottish blogger who recently published a very detailed overview titled "Ryzen - The Tech Press Loses The Plot", which cleared up the false and negative press aiming to help Intel. It seems that too many powerful people have invested heavily in Intel and NVidia, and are using their powers (such as Intel bribing reviewers and distributors against AMD... again!), including Goldman Sacks, who yesterday pushed out their "opinion" to buy NVidia stock, as the 1080ti was going to "hurt" AMD, failing to mention Vega will outgun the 1080 at less $$!

Anyway , this same guy has published what seems to be an excellent review of DX12 including a neat overview of multi-tasking, where the GPU asynchronously runs geometry, physics and rendering tasks.

Def worth a look, if not only for the comparisons of NVidia and amd dx12 driver implementation level.

I would opt for Vulkan and DX12 to have a common code base, most of VK concepts map almost 1:1 to DX12 concepts.

NVidia has been slopy to say the least in these last years. They were too comfortable with DX10 and DX11 and dominating the PC market that they just stopped innovating and focused in optimizing. AMD on the other side, started working on their own API, MANTLE, to test new things that were too diferent from GL/DX, collaborated with SONY to create the PS4, which was the perfect testbed for their researches (the PS4 architecture is quite similar to the architecture offered by Vulkan), donated later their API to the Khronos Group to create Vulkan, and in the end, resulted that their work was so close to DX12 that they had half of the work done.