Investigating Unity hang on second run (multi-threading)

Background

he problem of Unity hangs on second run may have multiple causes and can sometimes be difficult to debug. When searching for an answer I see many with the same problem. I am therefore sharing a my debugging process in hope that it can help others solve their own problem.

The problem

Unity has had a problem with background threads since I started using version 4.x.

Unity’s API, like most GUI API’s, is single threaded. This means that the main thread is the only thread that can access the API. On every call to the API the thread ID is checked and if its not the main thread an exception is thrown. There are good technical reasons why it is implemented this way by Unity, so I’m not going to wish for a thread safe API. The disadvantage is that since the main thread also does rendering having too much work happening on it will lower the FPS and can easily lead to hiccups in the frame rate. You have to implement a conservative distribution of work units between frames, which leads to lower speed on whatever work you are doing. It also complicated the code since you have to write it as coroutines or implement your own system to split the work into smaller units.

Since a modern computer usually have multiple cores you may want to utilize the unused computing power. To do this you simply spin off new threads and do some work in the background. You need to synchronize access to GUI somehow, for example to add actions to a queue that will be processed by the main thread. In my case I do most of the game logic in background threads (communication, mesh caclulation, etc) and I have a system of invoking and waiting for completion of GUI thread on an action. This means the background thread will sleep while waiting for GUI-thread, but it also means that I can have a straight forward linear code in my background threads without callbacks or similar.

On first run this works fine, but once I stop the game and click “Play” in Unity for the second time Unity freezes and won’t continue.

I suspect that the problem is the use of background threads because it started when rewrote parts of the code to use background threads. Also I’ve encountered this before with background threads. If I start the game but do not connect to a server (and hence don’t start background threads) then I can start it again and again without any problems.

All threads are called from a base class inherited by whatever class needs threads. I’ve done this for simplicity, even when the background task will run only once and exit the thread. This makes it easy to get an overview of where I’m using background threads. It also allows me to create destructor in the class to clean up any rouge threads when the instance is removed by the garbage collector. Also I am ensured that all threads have IsBackground=true and a name set. I even implemented a static list of all the threads so that I can clean them up globally when the game exits.

Investigation

Recreate the problem

Start game in Unity. Lets call this “First Run”.

Attach Visual Studio to Unity (using the play button in VS), pause the game in VS and look at Parallel Stacks (Debug->Window->Paralel Stacks).

Continue execution (play in VS).

Stop game in Unity.

Confirm in debug output that all threads have been stopped. (I log all startup and shutdown of threads from my background thread base class.)

(Note the name of one of the threads is not its given name, instead it shows some weird characters. May not be relevant though.)

Pause game in Visual Studio and confirm there are no threads running.

Start the game in Unity.

Unity now freezes. Lets call this “Second Run”. Note that attaching a debugger to Unity by use of play button in VS (using the VS Unity plugin in VS) at this point won’t give any result as Unity simply isn’t responding. Pausing it after attaching will lead to VS freezing too (waiting for Unity).

Collect some data

There is a second way to attach a debugger to Unity, and that is through Debug->Attach to process… -> Choose “Unity.exe”. This will attach a debugger to Unity without going through the VS Unity plugin and therefore show you more of what is going on.

Attach a debugger to Unity.exe, pause and look at Parallel Stacks. This is our baseline.

Resume VS, run the game in Unity, pause in VS and look at Parallel Stacks again.

Resume VS, stop the game in Unity, pause in VS and look at Parallel Stacks again.

The only difference from baseline is that we have 1 extra thread on “async_invoke_io_thread” and 1 more threads in “[external Code]”.

If I could see any of my threads not shut down at this point the answer could be simple: I need to implement a shutdown for that thread.

Resume VS, start the game in Unity (Second Run), pause in VS and look at Parallel Stacks again. Unity will now freeze (this is the problem).

A notable difference here now is that one thread is in “mono_domain_try_unload” and one is in “mono_domain_finalize”, which seems indicate its related to cleaning up after First Run.

Theory 1: Rouge background thread

I initially expected this to be a runaway thread I had to identify and ensure was shut down properly upon “OnApplicationQuit”. But the data seems to indicate that I have successfully shut down all my background threads.

Theory 2: Mutex lock in destructor

Since my code is written for multi-threading I do a lot of locking. And since Unity objects are considered unmanaged resources I have impemented destructors to handle cases where I fail to dispose of the unmanaged resources properly. Seing how one thread is stuck at “mono_domain_finalize” it may indicate that one of my destructors is stuck in a deadlock. Sadly the debugger will not tell me where it is stuck. “Show External Code” shows that its stuck on the same address as other sleeping threads are, but gives no insight in whats going on except for that.

Using WinDbg to analyze this will not help much as it is running on Mono and hence the .Net Clr libraries for WinDbg won’t work. Double-clicking it shows that its in gc.c in Mono and that

Solution

I did a system wide search for ~ (destructor) and removed all of them (just for testing). This solved the problem. Testing shows that it now starts/stops multiple times just fine.

So the problem was a mutex lock in a destructor that was being called by Mono upon unloading the appdomain before restarting the game for the second time.