Motivation

As creators of digital 3D content, we want our creations to look and perform their best. In some cases we may be attempting to push rendering to the limit because we are in control of the hardware on which the experience will run. In other cases, we may be building an experience that has to run on a variety of hardware. Either way, we can pick and choose from a variety of performance optimizations to improve both CPU and GPU running time: occlusion culling, texture atlasing, static and dynamic batching, GPU instancing, shader fallbacks, multi-threading, lightmapping, optimizing garbage collection / scripts, and many more techniques. One technique that has been tried-and-true in 3D graphics is using varying levels of detail (LOD) for meshes.

LOD0 is traditionally the original mesh. Each additional LOD is a decimation or reduction of the previous LOD, which reduces the polygon count.

If LOD is tried-and-true, then why talk about it? I’d venture to say it has something to do with how readily available LODs are.

Digital productions may not be taking full advantage of LOD for any of the following reasons:

Requires an artist to process LODs (e.g. individually, process them in a batch via UI, etc.)

Requires a custom pipeline to be set up

Involves some software engineering effort

Involves evaluating a variety of products for use at different stages in the pipeline and product approaches may each be different

Ultimately, may require estimating the benefits and trade-off for making use of LOD

This past summer I was looking to challenge some of these barriers to LOD usage in an experimental project I’ve called AutoLOD. I was joined in Labs by Yangguang Liao, a Ph.D. student from the University of California, Davis to assist with the project. Parts of this project were originally started at Hack Week 11 (2016) with Elliot Cuzzillo and continued during Hack Week XII (2017) with Jake Turner.

The above video shows rendering with traditional LOD [left] at an average of 30 fps compared to rendering with SceneLOD (part of the AutoLOD package) [right] at an average of 42 fps. At full zoom, traditional LOD uses ~9 ms / 7 ms (CPU/GPU) compared to the ~1 ms / 0.5 ms (CPU/GPU) for SceneLOD. Note: The color disparities are due to the current shader that is being used for SceneLOD, which can be customized.

Underneath each playback window is a recording of the profiler window. On the left, you can see rendering cost balloons as more of the scene is shown. On the right, the rendering cost stays relatively constant once Hierarchical LOD kicks in (more on this later). You may take notice that there is minimal CPU usage on the right, which is due to the reduced draw call count.

Vision

The vision of AutoLOD was to explore what an automatic, extensible, and pluggable level of detail (LOD) system might look like in Unity, which could support rendering-intensive projects and serve as a testbed for continuing LOD research. Let’s define these terms:

Automatic in that sensible defaults are used in order to auto-generate LODs, which will generally make projects run with better performance

Pluggable in that third parties can create their own LOD generators that can be used in place of a default LOD generator

Goals

Our initial goals for this experimental project were:

LOD generation on model import with sensible defaults

Project-wide and per-model LOD import settings

GPU-accelerated default LOD generator*1

Asynchronous, pluggable LOD generation framework

Hierarchical LOD support via SceneLOD*

Extensible runtime that can be paired with LOD generators for alternative techniques (e.g. continuous, view-dependent, etc.)1

“Workbench” scene that allows for LOD generator comparison1

Not all goals were reached due to time constraints. However, we felt that the experiment was a success in that parts of the vision proved out. Let’s dig into some of the details.

LOD Generation

Tying into the vision of LOD generation being automatic, our goal was to have sensible defaults that would work for most projects. Any professional LOD package comes with plenty of sliders and toggles and ideally those would only be necessary when an automatically generated LOD looked terrible enough to warrant tuning it by hand. That being said, there are project-wide settings that can be specified in Edit -> Preferences…

If any of the generated LODs are not correct, it’s possible to override them per model file:

It’s possible to change the simplifier/batcher combo for a single file or simply turn off automatic generation on import and supply the LODs manually. You can even add additional LODs in the LOD chain if you prefer. The LOD chain will get included in the imported version of the model file in the project, so no separate prefab is needed in order to set up a LODGroup.

SceneLOD

SceneLOD is inspired2 by the work of Erikson, C., D. Manocha, and W. Baxter in a 2001 I3D Paper. We decided to create an implementation that would work with the existing LODGroup component in Unity, so that a custom build of Unity would not be required. A bounding volume hierarchy (currently an Octree) of LODGroup components controls which LOD is being used to render the scene.

Why HLOD?

As a performance optimization, Hierarchical Level of Detail (HLOD) partitions individual meshes in a scene in order to replace those meshes with a grouped representation. Traditional Level of Detail (LOD) would select an appropriate mesh representation according to screen size, distance, viewpoint or some other metric. Each mesh rendered, regardless of which LOD is selected, adds an additional draw call typically. A limitation of traditional LOD is that there is no optimization in the aggregate for draw calls as each object’s LOD chain is evaluated individually. Static batching only solves part of this problem, since it aggregates by shared material. Draw calls typically burden the CPU, so reducing them will generally improve CPU performance.

HLOD can aid in reducing draw calls by combining all objects within a specific volume into a single mesh and potentially a single material by utilizing a texture atlas. For games that wish to display large sweeping views of a whole scene, HLOD can benefit performance greatly. In other cases, HLOD may also outperform the quality of individual LODs when decimated as a group of combined meshes. The drawback of HLOD is that extra memory cost for each HLOD mesh is required at every node in the BVH.

Performance Analysis

A slightly modified version of the demo scene provided by the POLYGON – City Pack was used. A camera was animated using Timeline to zoom from a close view to the entire view of the city in 5 seconds. Tests were performed on a Razer Blade laptop3.

Let’s take a closer look at the profiler views for traditional LOD and HLOD:

Traditional LOD (above) shows growing CPU and rendering cost as the camera zooms out and reveals more of the scene.

For the HLOD version, traditional LOD is active when playback initially starts, which explains the rendering cost at the beginning. Eventually the performance moves into near constant CPU and GPU costs once HLOD is fully utilized. BVH evaluation (i.e. determining which HLODs should render) has some CPU cost, too.

Additionally, the following experiments were run with the entire scene in view (camera stationary) in the GameView and the Stats window on:

CPU (ms)

% vs static

Render (ms)

% vs static

Triangles (M)

% vs static

Batches

% vs static

Default

17.9

-32

13.3

-73

5.5

0

7658

-91

Static batching

12.2

0

3.6

0

5.5

0

695

0

LOD

8.1

51

6.7

-46

0.8

588

1487

-53

GPU Instancing

18.4

-34

8.6

-58

5.5

0

2047

-66

HLOD

0.8

1,425

0.5

620

1.4

293

6

11,483

Static + Instancing

12

2

4

-10

5.5

0

694

0

LOD + Instancing

8.3

47

4.3

-16

0.8

588

691

1

HLOD + Instancing

0.8

1,425

0.6

500

1.4

293

4

17,275

Static batching involved marking all of the objects in the scene as static and then hitting play. Instancing involved enabling GPU Instancing in all materials used.

As you can see with a large city scene HLOD improves performance over traditional LOD by 1425% and reduces the draw call count from 1487 to only 6 draw calls!

However, where HLOD really takes off is when you build scenes that traditional LOD would normally not be able to handle:

This is an example scene with four copies of the original scene for a total of 6.2M triangles and 11655 batches. Rendering at 83.3 ms / 43.9 ms (CPU/GPU) this falls below interactive responsive rates.

Now, comparing this to an HLOD version of the same scene:

We’re still rendering at 1 ms / 0.4 ms (CPU/GPU) and only 6 batches even though we’ve increased the triangle count to 7M. Keep in mind that although the copies are of the original scene — you could expect the same performance even if each part of the city were individually unique.

Storage Cost

In a build, SceneLOD would add to the static mesh and texture size, but this can be reduced if the BVH depth is also reduced.

HLOD:

Textures 36.3 mb 5.3%
Meshes 599.9 mb 87.6%

LOD:

Textures 20.3 mb 20.9%
Meshes 30.0 mb 30.8%

The uncompressed size on disk of the HLOD meshes is 1.1GB.

Time Cost

One-time

There are some one-time costs for our HLOD implementation both in the generation of the BVH and for generating each HLOD, separate from LOD generation. These one-time computation costs will occur any time an object is added, moved, or removed in the scene. However, SceneLOD keeps track of these changes and updates the BVH and HLODs automatically in the background.

Dynamic

Each time a camera renders, it is necessary to walk the BVH and determine which LODGroup components should be enabled before rendering.4

Conclusion

We’ve found that Automatic LOD can remove some of the pain points to getting LOD into a digital production. Sensible defaults can get projects most of the way there and if any problem meshes exist, then they can be overridden on a case-by-case basis. SceneLOD provides an example implementation of HLOD that can be used with the current version of Unity on large scenes. If you are willing to trade storage cost for performance you might be able to improve rendering performance by an order of magnitude for extremely large scenes that have many static elements.

We hope this experimental project provides some insight to your own project’s performance challenges and/or gives you the ability to build more elaborate scenes. Certainly, there are many avenues for future work, such as support for dynamic objects, better compression for HLODs on disk, a default LOD generator, different shader profiles for HLOD rendering, and of course, optimization!

Please check out the code on GitHub and post any comments / issues you have directly to the project!

Did you install Simplygon’s Unity plugin and the accompanying Simplygon SDK, which has the local server? (FYI: If you haven’t already downloaded the Unity plugin it currently isn’t available for download)

Yes I have however it will not show still. Also how do you use the grid placement utility? For some reason it is not creating grids everywhere in my scene, only on some models. Is there documentation on the grid placement utilities?

Hey, I was just wondering what the general workflow was after you have Imported the unity package for Simplygon and the github files? I have a large scene with buildings, do i have to generate LODs for all the meshes in the scene? Sorry for the noobish questions in advance

If you have imported the Simplygon package, then you should see SimplygonMeshSimplifier show up in Edit->Preferences->AutoLOD that you can pick. You would also need to have the “Generate On Import” option selected. Then you will simply need to reimport your models to have the LODs generated.

Yes, you need to be logged into Simplygon. If you have the local version the default username/password is user/user. If you have any further questions about Simplygon, please take a look at their forums: https://simplygon.freshdesk.com/support/discussions

The blog post mentions: “There are some one-time costs for our HLOD implementation both in the generation of the BVH and for generating each HLOD, separate from LOD generation. These one-time computation costs will occur any time an object is added, moved, or removed in the scene.”

First, when you say “any time an object is added, moved, or removed in the scene” do you mean at run-time, or at edit-time?

Is this a significant computation cost? If you moved a mailbox or a car to a new position in that scene while the game was running (assuming it works that way), how long would it take for the HLOD system to catch up with the change?

Currently, this cost is at edit-time because run-time support does not exist in the current implementation. Nonetheless, if an object moved to a new location it would cause a rebuild of that HLOD leaf node all the way up the chain to the root node. With the TimedEnumerator included in the project this change could be propagated up the tree over many frames. Currently, we simply combine children HLODs to compose the parent HLODs, so it is relatively quick although I don’t have exact numbers for you. There is the possibility to simplify the combined HLOD at each node though, which likely would need to run on a separate thread and/or completed over multiple frames.

There are integrations for both Simplygon and InstaLOD or you could integrate any other package by implementing an ISimplifier wrapper. Simplygon still has a free solution called Simplygon Connect that only requires registration. Once you register you can download a Unity plugin that works with Unity 5.5 and up. I used this version for my tests that are shown above.

We had started work on a default LOD generator that was GPU accelerated and it worked for some cases, but wasn’t’ as robust as Simplygon, so we didn’t include this in the repository. Only a simulated LOD generator is there that would help give you estimates. There were only a few of us that worked on this, so we weren’t thinking that we’d necessarily have something as completed as professional LOD package in only a few months. We still made some attempts at it though.

I see what you mean now — Simplygon has removed their Unity plugin and plans to release a newer version at some point. I’m unsure why they would remove the old version as it was working for me in 2017.3. I’ve posted a comment to their forum to find out more info. It’s also possible for you to use InstaLOD if you want to get an evaluation version of that solution.

Considering that auto LOD generation is a feature Unity should have had a decade ago, I did not see this coming. Maybe Unity will become a more complete game engine out of the box in another decade. Better late than never, though.

could you also provide a “runtime api” for procedural generated stages (example minecraft), exactly as you did with NavMesh.

Why i propose this.
1) Denying 600 mb of the users hard disk is very expensive requirement. ouch! the equivalent procedural lod generation would cost 0 byte disk space, it would cost “ram” which is free.
2) since ram is faster than hard disk. and new computers will start to have 100gb ram, wouldn’t it be wiser to not let the user wait more on “yet another evil loading screen”.

Yes, ideally LOD generation could be procedural if the content itself were procedural and/or discrete LODs could be generated at runtime. Runtime generation has trade-offs though in terms of quality and running time.

Hello! Very interesting. When do you plan to add this via packman, or is that not the goal..? seems like a very soft release for one of the most essential pipeline improvements. Thanks for doing it but would love to see some momentum at Unity behind making this a first class citizen. Thanks for the work so far, everyone.

Yes, I believe that is the claim for DX12/Vulcan. For the demo scene, the cars and pedestrians would not be static as they would be moving / animating. HLOD can support dynamic elements, but that wasn’t added to this implementation.

When I get a model with LODs. The first thing I do is delete the LODs. I don’t even bother to mess with it. I’d probably rather my game lag then to deal with LODs. But ofcourse I don’t want lag so I’d just make changes to the game so it doesn’t lag.

The goal was the provide a default generator, but it wasn’t robust enough for release. There are conditional compilation hooks for Simplygon and InstaLOD though if you have access to one of those solutions.

It is amazing that we will get autolod.
On behalf of devs working with procedural/user generated content im asking you to think through the Runtime API – ability to bake lods for user/procedural content. An example would be Cities Skylines asset editor, user creates a group of objects, building, props to it, it would be lovely to be able to generate hlod for the whole thing, and then cache it on disk.