Seapegasus Blog

Java, linguistics, scifi, steampunk, games

Latest news from my DIY PC adventure: My spiffy gaming PC had been running without a hitch for a month. Then, in the last week of January, games started crashing. :-C

First an error beep, then the screen goes blank (“Energy Star”) as if someone had ripped out the HDMI cable. I checked, it was well attached. Then another, different beep (reminded me of the ones you get if a new device is ready?). The game’s background music kept playing in a loop, but the fighting NPCs etc. fell suspiciously silent. Was the OS was still working, but I couldn’t see what I was doing anymore (e.g. I could not force-quit the game via the task manager)? I can’t tell. If I had two screens, I would have attached the second screen to the on-board graphic card to check whether it would stay on. Without this option, all I can do is a hard shutdown.

I have the nagging suspicion that I should not have tried to install Windows 95 games in Windows 10… Instead of simply telling me that this is not supported, a Windows 10 wizard helpfully suggested it could try some unspecified fixes and workarounds. Frigg knows what that wizard actually did, but those old games never ran. (UPDATE: I reinstalled Windows 10 later, so these Win95 workarounds are gone.)

I installed two monitoring tools from a trustworthy source to see how the GPU temperature was doing. I chose “Open Hardware Monitor” and “CPU-Z”, which logs values in a file that you can check after a crash. In the worst cases, it went up to ~70°C, then my super sexy quiet fans came on, and kept it at that. Nothing looked out of the ordinary.

The Event Viewer contained several warnings and errors:

The previous system shutdown was unexpected.
No, dear PC, I pressed that button on purpose. You switching off my display was unexpected.

The description for Event ID 56 from source Application Popup could not be found.
The message resource is present but the message is not found.
No idea what that means. I kinda picture a dialog box popping up on my currently blank screen, informing me: “Error displaying error. Error not found. Please install better errors. Hello? Are you listening?” Likely a side-effect and not the cause.

Display driver nvlddmkm stopped responding and has successfully recovered.
Really, that’s what a display driver looks like when it successfully recovered? What does it look like when it fails? This message appears often, but not everytime.

Let’s have a look at the driver:

I also have a Linux partition, and Linux does not blank out, so my bet is on the Windows video drivers. (Installing Linux drivers for GeForce is a whole different story…) Note on the other hand that I have no demanding 3D games for Linux, so I cannot reproduce the problem anyway. If I could reproduce the crash under Linux also, I would blame hardware.

The page http://www.geforce.com/drivers tells me that NVIDIA had released new drivers on January 25, but my first crash was before that. I updated the drivers, but that made it only worse. Instead of crashing once per day, it now crashed once per hour.

I searched inside Windows for “check for updates”. In the Update panel, I clicked “advanced options”, and then “view update history”. This showed me that a recent attempt at updating NVIDA drivers had silently failed! The GeForce Experience app had not told me that. To be save, I went to the GeForce Experience settings and disabled auto-update for video drivers.

Long story short, after trying a dozen different things unsuccessfully, I thoroughly uninstalled everything that had the word NVIDIA in it, twice. (Plus some poor innocent oversized Blizzard games that had failed to download because my SSD was full, sigh.) Then I reinstalled the driver from December that had previously worked fine. It was still in my Downloads folder, but I could also have re-downloaded it from the GeForce page. (UPDATE: Half a year later, this driver version has fallen off the bottom of the list. Keep your own copies!) It went well for another month… and then started crashing again.

(LATER UPDATE)

Other things I tried after the problem reoccured: I reinstalled Windows 10 with only the on-board graphics card, before I stuck the GeForce back in and installed its drivers. This had the nice side-effect that the bootloader (that lets me choose between Linux and Windows) now appears within seconds after booting, and it no longer hangs in 50% of the cases! I also moved an SSD and some cables, and I didn’t fully close the PC case, to test whether any heat, weight, or pressure had been randomly dislodging the card. No luck. Finally I borrowed a friend’s old GTX 690: Zero crashes for two months since then!! I returned my defective card, summed up this whole story, and the shop gave me my money back with no hassle. :-)

Some general Windows 10 debugging tips:

I was shocked that for each error message, Windows-related forums contained at least one fake post saying “Download and run this .exe (link), it will magically fix everything”. Uh thank you, I’m cured! I now have a strict “no toolbars, no plugins, no speedtesters, no cleaner-uppers, no free games, no email, no nuthin” policy on this gaming PC.

Searching the web for anything plus “Windows 10” plus “64 bit” brings up a surprising amount of completely unrelated advice and downloads, even on microsoft.com. After a search, double check that what you are doing truly applies to your version.

You can only use logs for debugging if you compare them to normal conditions, so look at logs from times when it worked well, too. Otherwise you waste time hunting down irrelevant warnings.

If you use advice from forums, make yourself aware of the context. Is this problem being discussed mostly in Microsoft forums, PC hardware forums, graphiccard forums, specific game-related forums, PC or Mac related forums? Who talks about it gives you a hint what sources of error to include or exclude. It makes a huge difference whether many games are affected or one, or all platforms or one, or all graphiccards or one, etc.

Also, how old is the forum advice you are reading? If the same error message has been discussed for 5 years, and various forums are filled with contradictory advice that works for some but not for others, your alarm bells should ring. It means that this error message is very generic, and does not point to one specific problem. If you try all the different advices given, you will mess up your PC even worse.

Speaking of which: Write down (or take screenshots of) all the fixes you try, so you have half a chance to undo them.

Try to reproduce the problem after each fix. Don’t apply several fixes at the same time, because then you can’t tell which one the solution was. In a case where crashes re-appear only after a month, this is tedious.

Oh, and, fun fact of the day:

If you install Windows in one language, and then switch to English later (say, to be able to quote English error messages), your event log will be bilingual… for ever. :,-(Unable to find Verbindungsschichterkennungsprotokoll. The Anwendungsspezifisch permission settings do not grant Lokalaktivierung from address LocalHost unter Verwendung von LRCP. The following corrective action will be taken: Neustart des Dienstes. Windows failed to install Sprachpaket für Englisch. Nicht verfügbar. Microsoft, are you kidding me?