Has anyone used this for the purpose of occlusion culling and had it work for them?

I must be doing something wrong because it drops me from 70FPS (Frustum Culling only, 10,000 objects) to 1FPS and only half works (only one block works as an occlusion culler). If there is a better method for occlusion culling, please let me know. This appears to work well when used correctly, but I'm not sure how to correctly use it.... and I find Microsoft's reference guide to be useless for learning.

[source lang="cpp"] // Setup how the query functions queryDesc.Query = D3D11_QUERY_OCCLUSION; queryDesc.MiscFlags = 0; // Create the query m_D3D->GetDevice()->CreateQuery(&queryDesc, &pQuery); // Go through all the models and render them only if they can be seen by the camera view. for(index=0; index<modelCount; index++) { // Get the position and color of the object model at this index. m_ModelList->GetData(index, positionX, positionY, positionZ, color); // Texture number that holds the low resolution occlusion texture texture = 4; // Start the query m_D3D->GetDeviceContext()->Begin(pQuery); // Matrix translation D3DXMatrixTranslation(&worldMatrix, positionX, positionY, positionZ); // Render the object's occlusion texture result = m_LightShader->Render(m_D3D->GetDeviceContext(), m_Model->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, m_Model->GetTexture(texture), m_Light->GetDirection(), m_Light->GetAmbientColor(), m_Light->GetDiffuseColor()); // End the query m_D3D->GetDeviceContext()->End(pQuery); // Get the data from the query and determine if object should be rendered // Returns whether or not a object is in view. If 0, then the object is not in view. while ( S_OK != m_D3D->GetDeviceContext()->GetData(pQuery, &queryData, sizeof(UINT64), 0 ) ) { // If object is not in view if (queryData == 0) // Should not be rendered renderModel = false; // If object is in view else // Should be rendered renderModel = true; } // Render the model if it was in view if(renderModel) { // Move the model to the location it should be rendered at. D3DXMatrixTranslation(&worldMatrix, positionX, positionY, positionZ); // Put the model vertex and index buffers on the graphics pipeline to prepare them for drawing. m_Model->Render(m_D3D->GetDeviceContext()); // Random texture thing (worked before this occlusion culling. now the culling texture takes over) if (positionY == 0) texture = 3; else if (positionY >= -16) texture = 2; else texture = 1; // Render the model using the light shader. result = m_LightShader->Render(m_D3D->GetDeviceContext(), m_Model->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, m_Model->GetTexture(texture), m_Light->GetDirection(), m_Light->GetAmbientColor(), m_Light->GetDiffuseColor()); // Reset to the original world matrix. m_D3D->GetWorldMatrix(worldMatrix); // Since this model was rendered then increase the count for this frame. renderCount++; } }[/source]

The reason why is because of GPU latency - because the GPU is a parallel processor it's allowed to just store out commands and data and get round to actually drawing in it's own sweet time. That may be anything up to (typically) 3 frames after the command is issued.

By fetching the result of the query immediately after it's been run, and doing it for every single model, you're breaking this parallelism. Instead of nice fast rendering you get a huge pipeline stall each time you fetch the results. The more models you have the worse it will be.

To compound the misery you're creating new query objects at runtime each frame (and you don't seem to be destroying them so you've got a resource leak too). This is all over the docs and recommendations - resource creation is expensive, don't do it at runtime, do it once only during startup.

Back to the queries.

There are two possible approaches here. The first is to create n query objects (one for each model), then go through all of your models, begin query, draw bounding geometry, end query, next model. That will give the queries some time to issue and run; the theory is that hopefully by the time the last query is issued you'll have the first one near ready (you won't, but it's nowhere near as bad as what you currently have) so then you go through all the models again, fetch the results and conditionally draw.

Because of the up to 3 frame latency you're still going to break CPU/GPU parallelism, but it won't be so bad as your current method. At least you'll get something.

The second method is a little more sophisticated in that it takes advantage of so-called "temporal coherence", i.e. the fact that this kind of visibility probably won't change too much between individual frames. So each frame you fetch the results from the previous frame's set of queries, then issue a new set for the next frame.

A variation involves testing the query to see if the results are ready yet (check your D3D documentation) and - if not - using the last valid result. If there is no last valid result (it might be the first frame, or the model might have been frustum culled on the previous one) then you must assume that the model is visible (alternatively you could force the result fetch).

Some final notes.

I mentioned n queries above, but what if you don't know what value n should have? You could just create an array (or other container) of query objects sized at some hypothetical maximum and pull from that, or you could dynamically create new query objects on-demand and store them in a list; the key though is to re-use objects that were previously created, don't create new ones if you don't have to.

You'll also find that there is some cutoff point - depending on shader complexity/etc - below which it's going to be cheaper to not bother with a query but just always draw the model instead. You'll need to experiment to find that, but it will depend on number of vertices in the model, number of indices, and other such factors.

Finally, when using bounding volumes, you're going to have cases where your viewpoint is inside the bounding volume of a model - don't run a query if that happens; the model is visible, just draw it.

And before I go, one perfectly valid point is this: you've already got 70fps, it's mission accomplished, you're fast enough, move on to the next problem. You may however have yet to add physics, sound, networking, etc, or you may want additional headroom for more complex scenes, so I'm assuming that's why you want to go faster.

Phew! All of this sounds a hell of a lot more complex than just drawing models, doesn't it? And yes, it is, so you may even find that alternative techniques - such as instancing - give you perfectly adequate performance with a whole lot less complexity than using occlusion queries.

Edited by mhagain, 11 October 2012 - 03:14 PM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

I see what you mean about fetching results immediately, and I fixed that. I also now create the array of query objects prior to runtime. Both improved the performance, however, I'm still doing something wrong as it does not actually do any culling.

Here's my new code. The render count always shows up as 0, therefore nothing is being set as OK to render with the correct texture. However, all of the object are rendered with the 20x20 texture, so the first loop is working.

[source lang="cpp"] // Go through all the models for(int index = 0; index < modelCount; index++) { // Get the position of the model m_ModelList->GetData(index, positionX, positionY, positionZ, color); // Leftovers radius = 1.0f; // Check if the model is in the frustum renderModel = m_Frustum->CheckCube(positionX, positionY, positionZ, radius); // If it can be seen... if(renderModel) { // Move the model to the location it should be rendered at. D3DXMatrixTranslation(&worldMatrix, positionX, positionY, positionZ); // Prime the buffers m_Model->Render(m_D3D->GetDeviceContext()); // Start the query m_D3D->GetDeviceContext()->Begin(m_pPredicate[index]); // Render the model to the world using the 20x20 texture result = m_LightShader->Render(m_D3D->GetDeviceContext(), m_Model->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, m_Model->GetTexture(2), m_Light->GetDirection(), m_Light->GetAmbientColor(), m_Light->GetDiffuseColor()); // End the query m_D3D->GetDeviceContext()->End(m_pPredicate[index]); // Reset the world matrix m_D3D->GetWorldMatrix(worldMatrix); } } // Go through all the models for (int index = 0; index < modelCount; index++) { // Stores the result of the query bool renderM = 0; // Get the result of the query and store it to renderM m_D3D->GetDeviceContext()->GetData(m_pPredicate[index], &renderM, sizeof(BOOL), 0); // If the query said the object was OK to render... (IS NOT WORKING AT ALL) if (renderM == true) { // Reset the status to false m_D3D->GetDeviceContext()->SetPredication(m_pPredicate[index], FALSE); // Get the model information m_ModelList->GetData(index, positionX, positionY, positionZ, color); // Move the model to the right location D3DXMatrixTranslation(&worldMatrix, positionX, positionY, positionZ); // Prime the buffers m_Model->Render(m_D3D->GetDeviceContext()); int texture; if (positionY == 0) texture = 3; else if (positionY >= -16) texture = 2; else texture = 1; // Render the model to the screen with the right texture result = m_LightShader->Render(m_D3D->GetDeviceContext(), m_Model->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, m_Model->GetTexture(texture), m_Light->GetDirection(), m_Light->GetAmbientColor(), m_Light->GetDiffuseColor()); // Reset to the original world matrix. m_D3D->GetWorldMatrix(worldMatrix); // Since this model was rendered then increase the count for this frame. renderCount++; } }[/source]

And now I have to ask, how is everyone else handling this? It doesn't appear to be a common question on the internet at all! Right now my testing is done with 1 model rendered in 10,000 different locations, but that will increase massively once I find a good method for occlusion. The FPS performance is already far less than I want for what I currently have.

The reason you can't find much information is because not many people are doing it. Plenty have tried, and failed. It's really hard to make occlusion queries work for a general visibility system, due to the latency and batching problems. The more "en vogue" technique at the moment is software depth buffer rasterization + occlusion testing.

The reason you can't find much information is because not many people are doing it. Plenty have tried, and failed. It's really hard to make occlusion queries work for a general visibility system, due to the latency and batching problems.

+1; there is wisdom in this.

Occlusion queries are one of those features that look great on paper, but when you actually start implementing something and dealing with all of the nasty edge cases you quickly realise that they're not exactly all they're cracked up to be.

What they may be good for is pre-processing a map to determine visibility info, but at runtime there are just too many places where they can fail to give you a good result to make them of much utility for a general case solution.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.