Search Unity

On DOTS: C++ & C#

This is a brief introduction to our new Data-Oriented Tech Stack (DOTS), sharing some insights in how and why we got to where we are today, and where we’re going next. We’re planning on posting more about DOTS on this blog in the near future.

Let’s talk about C++. The language Unity is written in today.

One of many advanced game programmers’ problems at the end of the day is that they need to provide an executable with instructions the target processor can understand, that when executed will run the game.

For the performance critical part of our code, we know what we want the final instructions to be. We just want an easy way to describe our logic in a reasonable way, and then trust and verify that the generated instructions are the ones we want.

In our opinion, C++ is not great at this task. I want my loop to be vectorized, but a million things can happen that might make the compiler not vectorize it. It might be vectorized today, but not tomorrow if a new seemingly innocent change happens. Just convincing all my C/C++ compilers to vectorize my code at all is hard.

We decided to make our own “reasonably comfortable way to generate machine code”, that checks all the boxes that we care about. We could spend a lot of energy trying to bend the C++ design train a little bit more in a direction it would work a little bit better for us, but we’d much rather spend that energy on a toolchain where we can do all of the design, and that we design exactly for the problem that game developers have.

What checkboxes do we care about?

Performance is correctness. I should be able to say “if this loop for some reason doesn’t vectorize, that should be a compiler error, not a ‘oh code is now just 8x slower but it still produces correct values, no biggy!’”

Cross-architecture. The input code I write should not have to be different for when I target iOS than when I target Xbox.

We should have a nice iteration loop where I can easily see the machine code that is generated for all architectures as I change my code. The machine code “viewer” should do a good job at teaching/explaining what all these machine instructions do.

Safety. Most game developers don’t have safety very high on their priority list, but we think that the fact that it’s really hard to have memory corruption in Unity has been one of its killer features. There should be a mode in which we can run this code that will give us a clear error with a great error message if I read/write out of bounds or dereference null.

Ok, so now that we know what things we care about, the next step is to decide on what the input language for this machine code generator is. Let’s say we have the following options:

Custom language

Some adaption/subset of C or C++

Subset of C#

Say What C#? For our most performance critical inner loops? Yes. C# is a very natural choice that comes with a lot of nice benefits for Unity:

It’s the language our users already use today.

Has great IDE tooling, both editing/refactoring as well as debugging.

A C#->intermediate IL compiler already exists (the Roslyn C# compiler from Microsoft), and we can just use it instead of having to write our own.

I quite enjoy writing code in C# myself. However, traditional C# is not an amazing language from a performance perspective. The C# language team, standard library team, and runtime team have been making great progress in the last two years. Still, when using C# language, you have no control over where/how your data is laid out in memory. And that is exactly what we need to improve performance.

On top of that, the standard library is oriented around “objects on the heap”, and “objects having pointer references to other objects”.

That said, when working on a piece of performance critical code, we can give up on most of the standard library, (bye Linq, StringFormatter, List, Dictionary), disallow allocations (=no classes, only structs), reflection, the garbage collector and virtual calls, and add a few new containers that you are allowed to use (NativeArray and friends). Then, the remaining pieces of the C# language are looking really good. Check out Aras’s blog for some examples from his path tracer toy project for some examples.

This subset lets us comfortably do everything we need in our hot loops. Because it’s a valid subset of C#, we can also run it as a regular C#. We can get errors on out of bounds access, with great error messages, debugger support and compilation speeds you forgot were possible when working in C++. We often refer to this subset as High-Performance C# or HPC#.

Burst compiler: Where are we today?

We’ve built a code generator/compiler called Burst. It’s been available since Unity 2018.1 as a preview package. We have a lot of work ahead, but we’re already happy with it today.

We’re sometimes faster than C++, also still sometimes slower than C++. The latter case we consider performance bugs we’re confident we can resolve.

Only comparing performance is not enough though. What matters equally is what you had to do to get that performance. Example: we took the C++ culling code of our current C++ renderer and ported it to Burst. The performance was the same, but the C++ version had to do incredible gymnastics to convince our C++ compilers to actually vectorize. The Burst version was about 4x smaller.

To be honest, the whole “you should move your most performance critical code to C#” story also didn’t result in everybody internally at Unity immediately buying it. For most of us, it feels like “you’re closer to the metal” when you use C++. But that won’t be true for much longer. When we use C# we have complete control over the entire process from source compilation down to machine code generation, and if there’s something we don’t like, we just go in and fix it.

We will slowly but surely port every piece of performance critical code that we have in C++ to HPC#. It’s easier to get the performance we want, harder to write bugs, and easier to work with.

Here’s a screenshot of Burst Inspector, allowing you to easily see what assembly instructions were generated for your different burst hot loops:

Unity has a lot of different users. Some can enumerate the entire arm64 instruction set from memory, others are happy to create things without getting a PhD in computer science.

All users benefit as the parts of their frame time that are spent running engine code (usually 90%+) get faster. The parts that are running Asset Store package runtime code gets faster as Asset Store package authors adopt HPC#.

Advanced users will benefit on top of that by also being able to also write their own high-performance code in HPC#.

Optimization granularity

In C++, it’s very hard to ask the compiler to make different optimization tradeoffs for different parts of your project. The best you have is per file granularity on specifying optimization level.

Burst is designed to take a single method in that program as input: the entry point to a hot loop. It will compile that function and everything that it invokes (which is guaranteed to be known: we don’t allow virtual functions or function pointers).

Because Burst only operates on a relatively small part of the program, we set optimization level to 11. Burst inlines pretty much every call site. Remove if checks that otherwise would not be removed, because in inlined form we have more information about the arguments of the function.

How that helps with common multithreading problems

C++ (nor C#) doesn’t do much to help developers to write thread-safe code.

Even today, more than a decade since game consumer hardware has >1 core, it is very hard to ship programs that use multiple cores effectively.

Data races, nondeterminism and deadlocks are all challenges that make shipping multithreaded code difficult. What we want is features like “make sure that this function and everything that it calls never read or write global state”. We want violations of that rule to be compiler errors, not “guidelines we hope all programmers adhere to”. Burst gives a compiler error.

We encourage both Unity users and ourselves to write “jobified” code: splitting up all data transformations that need to happen into jobs. Each job is “functional”, as in side-effect free. It explicitly specifies the read-only buffers and read/write buffers it operates on. Any attempt to access other data results in a compiler error.

The job scheduler will guarantee that nobody is writing to your read-only buffer while your job is running. And we’ll guarantee that nobody is reading from your read/write buffer while your job is running.

If you schedule a job that violates these rules, you get a runtime error every time. Not just in your unlucky race condition case. The error message will explain that you’re trying to schedule a job that wants to read from buffer A, but that you already scheduled a job before that will write to A, so if you want to do this, you need to specify that previous job as a dependency.

We find this safety mechanism catches a lot of bugs before they get committed and results in efficient use of all cores. It becomes impossible to code a deadlock or a race condition. Results are guaranteed to be deterministic regardless of how many threads are running, or how many time a thread gets interrupted by some other process.

Hacking the whole stack

By being able to hack on all these components, we can make them be aware of each other. For example, a common case for a vectorization not happening is that the compiler cannot guarantee that two pointers do not point to the same memory (aliasing). We know two NativeArray’s will never alias because we wrote the collection library, and we can use that knowledge in Burst, so it won’t have to give up on optimization because it’s afraid two array pointers might point to the same memory.

Similarly, we wrote the Unity.Mathemetics math library. Burst has intimate knowledge of it. It will (in the future) be able to do accuracy sacrificing optimizations for things like math.sin(). Because to Burst math.sin() is not just any C# method to compile, it will understand the trigonometric properties of sin(), understand that sin(x) == x for small values of x (which Burst might be able to prove), understand it can be replaced by a Taylor series expansion for a certain accuracy sacrifice. Cross platform & architecture floating point determinism is also a future goal of burst that we believe is possible to achieve.

The distinction between engine code and game code disappears

By writing Unity’s runtime code in HPC#, the engine and the game are written in the same language. We will distribute runtime systems that we have converted to HPC# as source code. Everyone will be able to learn from them, improve them, tailor them. We’ll have a level playing field, where nothing is stopping users from writing a better particle system, physics system or renderer than we write. I expect many people will. By having our internal development process be much more like our users’ development process, we’ll also feel our users pain more directly, and we can focus all our efforts into improving a single workflow, instead of two different ones.

In my next post, I’ll cover a different part of DOTS: the entity component system.

Nice idea, for future generations.
ILCPP creation – integration – optimization cycle took some years, yet it seems fast compared to the current ECS – Burst – Dots – whaterver thing that is still in the creation phase after >2 years. I understand the potential benefits but it takes, like, forever.

I respect the vision and the work of the people who are at this task, but: This compiler pipeline must come out of preview at last. No, I do not appreciate another blog post about the cool things that you are working on the compiler front, I would appreciate instead to see the compiler improvements out of preview and start using them in production. Preview features is something that makes sense if the preview phase is short and after some months you can use them in production – but this is not the case here.

The hub works, the packages work, many features are getting added but those two areas – rendering pipeline and compiler pipeline improvements – are in preview since the beginning of the subscription model. It goes significantly slower than what I would expect, when I see an upcoming feature in the blog, I would expect to be able to use it in production after some months – but we are talking years here.

The PBR rendering system is pretty stable. Perhaps Unity 2019 should have been called Unity 2020 LTD to make it clearer to new users that it is a WIP software. In this way, the LTD arrives when the year starts (and not a year later). So that beginners do not experiment with software in the evolution phase.

In my opinion, by extending it over and over, Microsoft has fallen into a pattern similar to C++. Each new version adds more features, but really doesn’t address the deficiencies of the language. After a few version cycles, it has become a pile of disjointed features with very little reasoning or consistency.

My biggest concern with the unity subset and ECS in general is that Microsoft is moving in the same direction. Though, instead of starting from and a naked codebase and building up from scratch, they are adding higher performance alternatives to replace existing abstractions.

This includes additions to unsafe and ref, turning off the GC, value tasks, and if you need a multi-threaded job scheduler that runs on all CPUs there is thread channels.

Moreover these high-performance advancements also work in non Unity environments such as a server.

I have been coding this way for about 2 years now. Not only is my C sharp super fast, add non GC allocating ; but the constraints has really improved code readability.

This is kinda funny. I was discussing with a friend the exactly same thing just two days ago. Looks like Unity devs share same outlook on HL Programming. I think if you have infinite resources to optimize C++ will always be faster, but for us limited time people, HL Languages do provide a good option.

C#++ :) I am glad Unity chooses C#, because C# 8.0, NET.Core, etc. are better and better with the passage of time in all directions: ease of use, IDE integration, object-oriented and functional oriented, and of course a high performance that brings it very close to C++ without C++ problems.

Pretty much:
– C++ is very unpredictable when it comes to compiling. It can work sometimes, but wont work.
– DOTS C# is to replace the C++ side of things and Unity has managed to get close to the performance if not faster than C++ while having the advantages of C#.
– Eventually a lot of things, like particle system and maybe even the physics system, will fall in line with DOTS for everyone to see and improve upon. (Well the ECS versions of these things)

As Lucas Meijer explains:
Unity editor is running and is made in C++. C++ (nor C#) doesn’t help developers to write thread-safe code (that uses more than one CPU core or threads).

We (users) use C# in Unity to make small scripts because is much simple. I agree that I can enjoy writing C# scripts. C# is simple and more robust or reliable. Safety, in Unity, has been one of its killer features (Lucas Meijer say). Yes in my opinion, but is missing a killer loop button for codesurfers.

Anyway, Lucas Meijer explains:
Unity software will migrate from C++ to C# to have complete control over the entire process from source compilation down to machine code, to get great error messages and improve compiling time. Unity team, “you should move your most performance critical code to C#”.
C# Is not an amazing performance language as C++ (because of the compilers). And we all want C++ performance. For doing this, Unity Technologies is creating a toolchain name HPC# and Burst; as an “easy way to describe logic” primary focus on performance with better memory layout.

So they slowly will port critical code from C++ to C#. A more performer Unity C# called (HPC#). And then from HPC# to Machine code.
HPC# will be C# sandbox [interfaces] highly optimizable and simple. Unity wants to create a final optimized performer code (to be created using previous experience). Scripts in HPC# will be converted to intermediate code or final machine code using Burst.

C# and C++ are just programming languages. Both are compiled to a Maschine readable code as nether c++ nor c# code is readable from the processor. The difference ist that when you compile c++ code it will write directly readable code for the processor while c# is compiled into a intermediate langurage that will only be compiled into maschine readable code when execute on a target device.

C++ and c# need a compiler.
This compiler does a lot of optimizations when doing its job like said “Inlining” code. As far as i understand c++ there arent so much options for the used c++ compiler to ensure that this will always happen on a particular piece of code. C# is different to that, it offer a lot more options to declare how to inline code and other optimizations. In Addition to that the most used C# Compiler ( called Roslyn ) is open source so that the Unity team can easly modify this compiler to produce way better code for the Engine as they have direct access to the way How the code is compiled!.

I’m assuming it would not be viable to transfer a game already developed in “classic unity” to DOTS? Or would I be able to switch some parts but not others? (I’m personally mostly interested in the prefab streaming concept)

You can absolutely switch some parts of a game over to using the DOTS while leaving the rest in “classic Unity.” That’s actually what we recommend to people who want to use it today, because DOTS doesn’t yet have a bunch of the features that you’d need to develop a full game, so mixing in some ‘classic’ via our hybrid mode is somewhat necessary.

DOTS looks super promising. I’m hoping many of the performance improvements will be seen in the Editor. In particular, I’d like Unity to load my code changes a lot faster after recompilation and when I press play.

I read the part where you where saying that runtime ECS code will be open sourced, so that the whole community can contribute and make Unity better. That would be nice if there was a way Unity can release the engine source code as well. That would allow developers to write their own renderer / physics system, and then compile a Unity executable for their game’s specific needs. A racing game, for instance, isn’t going to use the same physics as a fighting game. I know developers can already spend a hefty fine for the engine source code, but that would be nice if everyone had access for free, and can freely contribute like you mentioned for runtime ECS packages.

This is true today but may not be true once they get everything running under DOTS and people adopt it. At that point, core engine (at least as far as DOTS is concerned) can be super slim and include mostly only Unity’s own code. Today, we still got tons of third party libraries running under the hood which is a licensing nightmare if one would want to open source the engine code of Unity.

I see few alternatives here:
1) Unity keeps core engine closed source (this is still most likely option)
2) Unity gives us more limited, slimmer core engine variant with source code for DOTS & pure ECS usage only (without Umbra, Enlighten, FMod etc)
3) Unity deprecates and removes third party libraries and does what’s mentioned on #2 for all (which is exactly what has happened to game engines that now offer source code access to all engine users). But since Unity is really keen on keeping the backwards compatibility, if something like this happens it will only be because majority of the user base has already moved to DOTS and they can safely remove the old stuff.

I can imagine Unity in this case would simply distribute all of the code written by themselves over the years, as well as library’s that are already open source (such as mono). I can imagine all of the 3rd party tool-sets could be pre-compiled dlls, that act as dependencies to the Unity build chain. I think if they did this, they wouldn’t have too many issues with licensing, like rizu points out.

That is part of the reason why I’d like to see better defined contracts for each module from the engine, so that developers can democratize the engine code, swapping it out with their own, or other paid 3rd party solutions. Read more about my idea here

The only problem I have at this moment with jobs, that I have to copy data from native array to managed array before setting it to the mesh. When we can expect an API for mesh with native array support?

I rarely read through the whole blog posts, but in some cases (like this one – a somewhat deeper dive into the scripting aspect of Unity) I gladly make an exception.
I have to say Lucas – all you mention here sounds absolutely amazing, I am sold tenfold already! Can’t wait to use ECS and Burst on our next project.
But first, I NEED TO UNDERSTAND EVERYTHING. I want to make the best of it.

Great new for Unity developers !!!! Now I am very anxious to learn more advanced ECS features. Hope it can be available in GDC 2019 ( and Megacity demo too !!! Thanks for improving your Engine every day !!!