This is an update to the Software Occlusion Culling sample. This update consists of new features and optimizations which have reduced the total cull time and the total frame time by a factor of 4X and 2X respectively. Below is a screen shot of the updated sample.

Here is a list of the new features / updates that are included in this version of the sample:

New set of Occluders

New depth buffer view

VS2012 support

Rasterizer optimizations

Pipelining

New set of Occluders :
In the previous version, all castle walls along with the wooden pillars, tiny wooden trim and decorations were used as occluders to avoid special pre-processing of the art assets. However, we understand that the narrow pillars and tiny decorations are not good candidates for occluders in the scene. In this version, we chose only objects that are sufficiently large to occlude other objects in the scene. As shown in the image below only the castle walls (without the pillars and the wooden decorations) and the ground plane are used as occluders. This reduces the number of occluders that have to be rasterized to the depth buffer to 115 as opposed to 1628 in our previous version. Below is a screen shot of the occluders in the scene

The sample can now be compiled in VS2012. There are 2 projects for VS2010 and VS2012. One of them with ‘AVX’ in its name (SoftwareOcclusionCullingDX_2012_AVX / SoftwareOcclusionCullingDX_2010_AVX) is compiled with the /arch:AVX flag and can be compiled only on AVX supported systems. Use the other project (SoftwareOcclusionCullingDX_2012/ SoftwareOcclusionCullingDX_2010) on non AVX supported systems.

Rasterizer Optimizations:
Fabian Giesen has been optimizing this sample on github and maintaining a blog. Most of the optimizations have been integrated into the sample.

Pipelining:
When software occlusion culling is enabled, once every frame the occluders are rasterized to the depth buffer on the CPU. Then the occludee AABB are rasterized and depth tested against the CPU rasterized depth buffer to generate a list of models that are visible to send to the GPU for rendering. When pipelining is enabled, the sample does not wait for the software occlusion culling algorithm to complete and generate the list of visible models. Instead occlusion culling is kicked off in frame n to generate a list of visible models and they are sent to the GPU for rendering in frame n+1.

Performance:

The performance for the updated Software Occlusion Culling sample was measured on a 2.3 GHz 3rd gen Intel® Core™ processor (Ivy Bridge) system with 4 core / 8 threads and Intel® HD Graphics 4000. We set the rasterizer technique to SSE, the occluder size threshold to 1.5, the occludee size threshold to 0.01, and the number of depth test tasks to 20. We enabled frustum culling and multi-tasking and disabled vsync.
The castle scene has 115 occluder models and 48700 occluder triangles. It has 27025 occludee models (occluders are treated as occludees) and ~1.9 million occludee triangles.

The time taken to rasterize the occluders to the depth buffer on the CPU was ~0.71 milliseconds, and the time taken to depth test the occludees was ~0.67 milliseconds. The total time spent on software occlusion culling was ~ 1.38 milliseconds.

Comments (10)

Cool stuff here! I believe I found a slight issue, it's an edge case though but I thought I'd share my findings :) If there are no occluders, the SSE implementation of the depth rasteriser doesn't update the depth summary (aka the hi-z).

i.e. DepthBufferRasterizerSSEST::RasterizeBinnedTrianglesToDepthBuffer() in the allBinsEmpty case, needs to break rather than return, so CreateCoarseDepth() is always called.

Archive: softwareocclusionculling.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of softwareocclusionculling.zip or
softwareocclusionculling.zip.zip, and cannot find softwareocclusionculling.zip.ZIP, period.