How to diagnose and fix blue screen of death crashes

Debugging explained

Shares

There's nothing quite as frustrating. One moment you're working at your PC, the next your screen turns blue and your system reboots, destroying all unsaved work. Then, an hour or so later, it happens again. What's going on?

To diagnose and fix blue-screen crashes you need to know what is causing them. But don't expect Windows to help.

Head off to 'Problem Reports and Solutions' in Vista, for instance, and you'll typically see useless crash descriptions like 'Windows shut down suddenly'. Gee, thanks. Fortunately there's a free alternative: Microsoft's debugger, WinDbg.

Point this at the last crash dump and it can tell you the most likely file, DLL or driver behind the crash; list everything else that was running; warn you of potential memory leaks; and provide useful troubleshooting and diagnostic information.

If your PC is unstable then there's no better tool to find out the cause.

Launch KitSetup and you'll be presented with a list of various driver development options. Just check the box for 'Debugging Tools for Windows' and click 'OK'.

For WinDbg to work properly it must be able to download 'symbols': files that help the debugger convert raw binary information into the function and variable names used by Windows components. These can be saved locally – a good idea as it'll mean you only have to download them once. Create a folder for them – something like 'C:\Windows\ Symbols' will be ideal.

You'll then need to tell the program where its symbols can be found and saved. Click Start, type WinDbg and click the 'WinDbg.exe' link to launch the debugger.

Click 'File | Symbol File Path' to see the current path. Next, enter a path like SRV*c:\windows\ symbols*http://msdl.microsoft.com/download/symbols in the box, where 'c:\windows\symbols' is replaced by the path to your own local symbol folder. Click OK, close the program and click 'Yes' when asked if you want to save information for workspace – this will save the path you've just entered.

It's important to check that Windows is configured to create memory dump files when your PC crashes, because WinDbg needs these to figure out what was happening at the time. To set this up, click Start, right-click 'Computer' and click 'Properties | Advanced system settings | Startup and Recovery Settings'. Ensure that the 'Write an event to the system log' box is checked to make sure that Windows collects information on your crashes.

DIAGNOSIS:Crash diagnostics starts with the blue-screen error itself – this will sometimes name the file that's most likely to have caused the crash

Next, clear the 'Automatically restart' box so that you'll have a chance to read any on-screen error messages, and select 'Kernel memory dump' in the 'Write debugging information' list to ensure Windows saves all its memory blocks if it crashes.

Make a note of the dump file name – it's probably 'Memory. dmp'. This is the crash dump file you'll need to locate later. Finally, click 'OK' to finish the job.

Create a report

Once WinDbg has been set up, it's surprisingly easy to use at a basic level, and absolutely anyone can use it to find out more about their system's last crash.

To give this a try for yourself, click Start, type WinDbg and click the WinDbg link. Click 'File | Open Crash Dump', then navigate to and select your last crash dump file. This will probably be at '\Windows\Memory.dmp', although you may have additional files in '\Windows\Minidump'. Click 'Open', then wait as the file is analysed.

DEBUG-IT:Just open your crash dump file to make the Windows Debugging Tools identify the file that it feels caused the crash

This can take a while – five minutes or more – depending on the complexity of the dump file and the speed of your PC, so be patient.

A '0 : kd>' prompt appears at the bottom of the screen when it's done, and you can then scan the rest of the report to see what's on offer. Typically, near the bottom of the report, you'll see a line like 'Probably caused by : driver.sys', where 'driver.sys' is replaced by the name of the file that WinDbg believes was responsible for the crash. Perfect!

If you don't recognise the name, Google it – maybe with additional keywords like blue screen – and you might immediately discover the app behind the instability, as well as some potential fixes.

Blue-screen crashes can be complicated, though, because the file that caused the crash isn't necessarily the one responsible for your problems. That sounds odd, but look at it this way: if a faulty driver gives Windows an incorrect memory location, then this may be passed on to several other Windows components. Eventually one may try to access the memory, triggering a crash in that file – but the real problem is in the driver.

If WinDbg names some core Windows component or another application that you're sure is working just fine, then it may be a problem like this. You'll need to do a little more research to figure out what's really going on.

Dig a bit deeper

Scan your WinDbg report again, looking for lines highlighted with '** ERROR', complaining that 'symbols could not be loaded' for a particular file.

If you've correctly configured WinDbg then it will be able to load symbols with Windows components with no problem, so you'll know that these must be third-party drivers that were active at the time of the crash. Anything named like this is a possible culprit: again, search the web for the filename and you may locate other crash reports.

If that turns up nothing then click in the command line at the bottom of the WinDbg window, type !analyze –v (the '-v' means 'verbose') and press [Enter] for a more detailed analysis of your crash file.

The verbose report will be very much in developer-speak, with lots of figures, pointers, and development-related jargon, but you don't have to understand all of it. Just pick out the parts that provide more information.

You'll probably see an error message that spells out the crash reason, for instance. In one of our tests, the first report simply said our crash was 'probably caused by nvlddmkm.sys'. The verbose report explained that the crash occurred when an 'attempt to reset the display driver and recover from timeout failed', which is much more specific and useful.

If something similar appeared for you then you might install the latest driver updates for your display driver, and maybe that would solve the problem.

The verbose report may also contain details of the stack, essentially a list of the functions being called by Windows and your software immediately before the crash. This looks complicated – and to be honest, it is – but again, you don't have to understand every word. All you're looking to do is figure out what your PC was trying to do when the crash occurred, and the stack can offer very useful clues.

Examine your system

If the standard and verbose reports can't explain your crashes, you should take a closer look at your PC's configuration at crash time. It's just a matter of choosing the right command. Typing !vm and pressing [Enter] will display comprehensive details on your system's memory use, for instance. Scroll down the report, looking for oddities.

For example, is there a warning of 'excessive usage' around the 'paged pool' or 'non-paged pool' details? This could mean that you have a resource leak somewhere, perhaps a driver that's allocating Windows resources but not releasing them. Is your paging file near its maximum size, maybe? If this is happening, it may also be caused by a resource leak, or perhaps you've manually set it to a size that's smaller than it needs to be.

Finally, below the general report is a list of Windows components and the RAM they were consuming at crash time. Does anything stand out?

The Process command can also be useful, as it shows you the system processes that were running at crash time. Type !process 0 0 (to clarify, those are zeroes) and press [Enter] to get the full list.

Look for the HandleCount number here – this shows you how many Windows objects a process has open. This is normally a few hundred, perhaps a few thousand in some cases, but if it's many thousands without an obvious good reason – it's not an antivirus tool scanning your entire system, for instance – then again this might indicate that there's a problem.

For the in-depth report on exactly which processes were in memory when your PC crashed, type lmv and press [Enter]. The command is an abbreviation for 'Loaded Modules Verbose', and the report gives you a very long list of programs, drivers and Windows components that were active at crash time. Have a scroll through the list, and you'll probably find many drivers that you never knew you had.

If you spot any relating to applications that you no longer use, though, it's a good idea to uninstall them. There's no guarantee that it will stop your blue-screen crashes, but you'll free up a few system resources and simplify your system, and that's always a solid step forward.

Perhaps the most important point of all is not to give up. Crash dump analysis is tricky, and WinDbg won't always help with every crash, but you should keep digging anyway. It's likely that, before long, it will provide the clues you need to get your PC running smoothly again.

Sign up for TechRadar's free Weird Week in Tech newsletterGet the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register