I am not so worried about streaming as I can load everything in on start up, but everything else is very high on my list of dreams!

I would also agree that we should merge the PSO branch & metal branch into 2.1 and do an official release, and then start on a 2.2. I am literally delivering a system as I type this using Ogre 2.1, and although I know it is very stable, convincing my QA department was a struggle!

My main blocker is that we're not providing any sort of utility to cache PSOs for those who need to write low level shaders (mostly porting 3rd party GUI solutions).

It isn't hard to write one, it's just time consuming. It would do something similar to what Hlms does. Basically build an HlmsPso from the needed parameters, find in a map if it's already been created; if so, return that PSO. Otherwise create one.

dark_sylinc wrote:
It isn't hard to write one, it's just time consuming. It would do something similar to what Hlms does. Basically build an HlmsPso from the needed parameters, find in a map if it's already been created; if so, return that PSO. Otherwise create one.

Sounds perfect for a community contribution! I have not got round to looking at PSO's in Ogre yet so not really sure how they function, but I am sure ill look at it in the near future. But please share any other details you can so someone here can help implement and give you more time elsewhere.

From an engine design point of view this means PSOs should be created on load. From an API perspective, PSOs give a lot of performance optimizations to the driver as the driver can see everything that can and will be used (and each of its relationships), perform heavy optimizations; and encapsulate the optimized state into the PSO.

However for immediate-mode style rendering (very common in most GUIs out there); this paradigm sucks (unless the GUI has been designed in mind with PSOs).

Ogre (and therefore, the Hlms) separates the PSO into two segments: PSO data that rarely changes (such as RTT formats, depth buffer format, stencil test settings), and PSO data that may change often: Macroblock, blendblock, vertex layout, and shaders.
In Ogre we have for that HlmsPso & HlmsPassPso (HlmsPassPso is part of HlmsPso).

getPso() would check if it's dirty; if it's not, just return the same PSO as before. If it is; it will find an already created HlmsPso from its cache. If it's not, then create a new one (by calling _hlmsPipelineStateObjectCreated; when destroying the entire cache don't forget to call _hlmsPipelineStateObjectDestroyed).

Naturally, the dev using the cache should optimize as much as possible (i.e. if the vertex format is always the same, then call cache.setVertexFormat outside the loop).
Also if the user provides macro & blendblocks by pointer created from HlmsManager, checking if they're different is just a pointer compare. i.e. if( oldBlendblock != newBlendblock ) mDirty = true;

Overall it's simple, and would simplify a lot the porting of GUI tools (Gorilla, CEGUI, etc) to Ogre 2.1-pso

I follow 75% of what you have explained, but I think I should tackle and update one of the GUI's to get a better idea, maybe MyGUI. As I currently understand it though, this PSOCache is not required to get the GUI's to work, but they would greatly benefit from it?

It's a 2.2 branch. Meaning that a release candidate for 2.1 is on its way? In that case shouldn't the license text "Copyright (c) 2000-2014 Torus Knot Software Ltd" be updated? (or adding a supplemental 2015-2017 copyright).

Hello! I see a lot of movement in the branch, I hope it's doing great! You are a effing god Matias!
I have a couple of requests regarding this new system:
1) Would be possible to load easily in to vram just up to n mips? So I can have an option to use lower quality version of the textures that will actually use less vram.
2) And I would like an easy way to load a texture from a specific (relative or absolute) path, no using the resource manager.
Thanks!

xrgo wrote:1) Would be possible to load easily in to vram just up to n mips? So I can have an option to use lower quality version of the textures that will actually use less vram.

I can't comment on this because there's a lot of factors involved that make this complex and hard. Something like this is within the goals, but right now we're too far from that.

The short story is that if you have a material with three 2048x2048 textures and they're all in the same texture array, that's all and well. But if you only change one of those textures from 2048x2048 to 1024x1024; it's going to be in a different texture array and generate a new shader. And new shader = hiccup while compiling it.

If you downsize all 3 textures and if they end up in the same array again, then this can proceed without hiccups. But to be hiccup-free we have to guarantee that:

The texture you want is downsized

The other textures used by that material are also downsized

The downsized textures are put in the same arrays (to be able to use the same shader).

If one of these textures is also used by another material (which uses other textures), it may cause a domino effect

Ensuring that all of these conditions are met has its own overhead which could outweight just recompiling the shader.

But if memory consumption is your top priority you may just want to ignore that and pay the price of recompiling the shader. Btw the shader may already be in the microcode cache though, and the price will be very small. But because the number of texture permutations could be huge, we have to do a big effort to keep it from exploding or else the shader is likely not going to be in the cache.

That's the short version. There are more details at play.

xrgo wrote:2) And I would like an easy way to load a texture from a specific (relative or absolute) path, no using the resource manager.
Thanks!

After a long time, it's finally done: Yesterday I pushed a set of several commits that introduced the Texture Metadata Cache.

I opted to use JSON instead of a binary format because the metadata cache file on disk is easier to inspect that way. Not to mention the metadata can also be used to manipulate the new feature of texture pool IDs, which can be very important for some engines that take advantage of it.
Pool IDs are basically a way to ensure textures with the same pool ID get grouped together (as long as they have same format & resolution), or rather... a way to prevent completely different textures to accidentally end up being grouped together.

The Metadata cache still needs testing, but I can already notice the fewer of fps hitches in OpenGL when the textures finish streaming and appear on screen. But I have yet to test D3D11. I'd expect D3D11 to be much more benefited from the metadata cache.

The code also handles the case were the metadata was out of date (or just intentionally lied...). If the cache was out of date, loading times will be higher because we have to retry loading a few things again related to that texture from scratch. To keep thread safety the cache-missed texture needs to go back to the main thread and then back again to the worker thread.
While optimizing this corner case could be possible, it only complicates the code and design, and we have to work under the assumption that the cache will be correct 99% of the time, because it's rare to modify the width/height/pixel format/texture type of a texture even during development. And when that happens, the performance hit is definitely acceptable (it's a small 'hitch').

Test TextureGpu::scheduleTransitionTo (3x, one for each GpuPageOutStrategy option):

Resident -> OnSystemRam

Resident -> OnStorage

OnSystemRam -> OnStorage

OnSystemRam -> Resident

I don't think any of these work too well, if at all. The only well tested path is OnStorage -> Resident, and maybe Resident -> OnStorage.

Implement going Resident with AlwaysKeepSystemRamCopy. Right now TextureGpu demands the sysram copy to be provided with _transitionTo when going Resident, otherwise exceptions/asserts are triggered. Now that I've had time to think, this makes little sense. The pointer must be provided before/during TextureGpu::notifyDataIsReady gets called. There is no need to require it while going Resident. Back then, when I started, I had the notion that a TextureGpu being Resident meant it was ready to display, which is not the same thing. Hence it asks for a memory pointer when using AlwaysKeepSystemRamCopy. This is wrong.

Better error handing. Right now if there is an exception in the worker thread, the thread terminates abruptly and textures stop streaming, and the main thread will likely deadlock or livelock

Once that's done, it's basically it... the WIP label could be officially be dropped and it's 2.2. There could be rough edges to polish, but the big ones are these. And it's not that much work actually.