Re: [BUG] nvidia crashes kernel with 'Xid 13' and attempted to yield the CPU while at

Sadly, the just-released 304.37 release does not fix the problem (though I think according to the release notes it is supposed to). The symptoms are slightly different, but X still locks up completely and requires a cold boot.

I didn't actually see the "GPU has fallen off the bus" message in the log, but the video did freeze after about a minute of gameplay in Crysis2 (the audio kept going, though), and when I tried to close its window, its bumblebee X process froze, locking the main X shortly afterwards. I *was* able to ssh into the machine, which is new, but I was unable to kill -9 any of the locked processes, and restarting lightdm failed, as did a reboot command. Only a hard reset fixed it. fwiw, I was running the 3.6-rc1 kernel.

Re: [BUG] nvidia crashes kernel with 'Xid 13' and attempted to yield the CPU while at

I tried again with kenel 3.6-rc2, and nvidia 304.37 crashed about 30 seconds in to the game. Again there were no Xid errors or "GPU has fallen off the bus" messages, but the kernel reported hung processes within the nvidia module. Eventually I had to hard reset the PC because it became completely unresponsive. Below is the kernel log for the hung processes:

Re: [BUG] nvidia crashes kernel with 'Xid 13' and attempted to yield the CPU while at

Switching to MSI (NVreg_EnableMSI=1 in a modprobe conf) and setting the priority of the wine task to -20 *might* help. I even managed to get 4 minutes of gameplay out of crysis2 (although the norm is still 60 seconds), and CoD seems less likely to crash as well.

Re: [BUG] nvidia crashes kernel with 'Xid 13' and attempted to yield the CPU while at

Quote:

Originally Posted by rockob

Switching to MSI (NVreg_EnableMSI=1 in a modprobe conf) and setting the priority of the wine task to -20 *might* help. I even managed to get 4 minutes of gameplay out of crysis2 (although the norm is still 60 seconds), and CoD seems less likely to crash as well.

Scratch that, I don't think it helps. It was probably just luck that I managed to get 4 minutes of gameplay.

Re: [BUG] nvidia crashes kernel with 'Xid 13' and attempted to yield the CPU while at

OK, obviously such symptoms could be the result of different underlying problems, but...

in my case, what *completely* cured the problem of random freezes and Xid messages (after weeks of painful experimentation) was removing all kernel modules that have to do with thermal sensors. I have also disabled all the relevant plugins from gkrellm, in order to prevent such modules from being loaded automatically.

The system is rock solid now, running for 2+ days straight under KDE+composite without a single glitch.

I suspect that some such program or kernel module is periodically polling/generating interrupts under nvidia driver's nose, messing up the interface. It would be nice if someone could debug this to the end, though.