Nvidia suspend/resume Quick fix

October
21st,
2017

Problem : Nvidia card doesn’t wake up properly

Recently, I moved my GPU to my home office, where the machine is suspended overnight, rather
than being left on continuously. But that also caused the Nvidia card to present a
problem where it sometimes became unusable after the machine was suspended.

Solution : Kill running GPU processes

Finally, I noticed that the problem didn’t occur if there was nothing at all running
on the card during the suspend/resume cycle.

The card isn’t doing any active computation. However, simply running
a Jupyter notebook that imports tensorflow or PyTorch is enough
to create a process on the card, which causes the GPU to ‘lose connection’ after a
resume.

So : Before suspending, stop not only the active GPU machine learning things, but
also things (like Jupyter) that may be keeping the card occupied.

Older cards

Here’s the output of nvidia-smi if no process is running on a GTX 760 (look at the temperature, and the memory usage) :