Preventing Cycles Crash after CUDA timeout

I recently had a problem with Blender freezing and Cycles crashing due to “Unknown” CUDA errors. I just want to put this solution out there for anybody experiencing similar issues.

Crash characteristics

I encountered crashes when rendering on my GPU using Cycles (on Windows), originally only at some point in the rendering process. Side-effects were a Windows message saying “Display diver stopped responding and has recovered”, and the screen turning black for little less than a second.

Increasing the tile size caused the crashes to occur immediately with the first samples rendered.

If you have this problem on a device with Nvidia Optimus technology, you should first check if this solution works for you. A choice of the wrong GPU device may crash the graphics card driver. Here, forcing Blender to use the correct GPU device is the solution (as described in the BlenderArtists-thread).

In my case, this did not help. My console output was the following:

[...] (rendering a few, in some cases thousands of samples without crash)

Cause of the crash

The problem here is that Windows has a timeout detection and recovery (TDR) system that detects if a GPU computation takes longer than a given amount of time, the default value for that being two seconds, and then “reinitializes” the Windows Display Driver Model (WDDM) driver and resets the GPU. This will stop the rendering process. You will also notice that any displays attached to the GPU you use for rendering will turn black for a short moment.

Normally, this system is great because it prevents permanent screen freezes for malfunctioning drivers or games. But in Cycles, one sample is considered one computation, which means that if your sample calculation takes longer than two seconds, Cycles (and the Blender UI, if you render with UI) will crash.

Solutions

Changing the rendering settings

Because the computing time per sample mainly depends on the size of the tile and the complexity of the geometry inside it, reducing the tile size and reducing the complexity of your shaders will fix the issue. But reducing the tile size can increase rendering times on the GPU, and simple shaders might not be what you want to render. Therefore, the following solution is usually the better one.

Increasing the TDR timeout value

To prevent Cycles from crashing, you can also increase the TDR timeout. Before you do that, keep in mind that you should be careful when making any changes to your registry, and that for actual screen freezes, your computer might now wait longer until it restarts your driver, so be aware of that and remember to be patient in such a situation. I am not responsible for any damage caused by this, but it should be safe and it worked for me.

How to increase the TDR timeout time (on Windows 7, the Vista path should be very similar):

Open regedit.exe (Press the Windows key, type “regedit”, press enter, confirm that you want to open it).

a) If the folder contains a key with the name “TdrDelay”, right-click it and change the value to something high enough for your scene. Something between 8 and 16 should be fine, the number is interpreted as seconds. And be aware of the difference between a decimal and a hexadecimal input. b) If the folder does not contain a key with the name “TdrDelay”, right-click into the empty space below the values and create a new “DWORD”-value. Name it “TdrDelay” and change the value as described in 3.a).

Reboot your system for the changes to take effect.

Alternately, creating a DWORD “TdrLevel” and setting it to 0 (off) instead of the default 3 (recover on timeout) will also work. However, I do not recommend doing this because you may have to reboot your system in the case of a future driver or application crash that leads to a frozen screen – and data may get lost as a consequence of that.

Thank you thank you thank you for this post. More important than helping me resolve the issue, it was exceptionally clear and informative: exceptional because trial and error based on forum posts honed in on the problem, but (understandably) were by their nature not nearly as thorough. Thank you for taking the time to write this out.