Troubleshooting Stop Errors

The first time I saw a "Blue Screen of Death" (a colloquial name for a Windows Stop error) I was working with Windows NT 3.51 in a classroom. I was fascinated, but then quickly became frustrated because of the lack of information that my TechNet CD had in those days to help me understand what I saw.

Things have changed. Now you can find tons of information on TechNet about Stop errors, and you can search the Microsoft Knowledge Base online or just Google an answer for yourself. And while Stop errors themselves can still be hard to troubleshoot despite this wealth of information, they've fortunately become more rare as Windows platforms continue to improve. In this short article, I'll give you five tips on how to resolve these error messages and the tools you can use to troubleshoot them.

Tip #1: Ignore the Fluff

Stop screens generally contain a lot of information, most of it confusing. What's really important, however, are:

The symbolic name of the error, for example DRIVER_IRQL_NOT_LESS_OR_EQUAL.

The hexadecimal number of the error prefixed by "0x"; for example, "0x000000D1", which can be remembered as "0xD1" for short.

The name of any driver specifically mentioned as relating to the error.

The rest you can usually ignore unless you are a Microsoft support technician versed in the arcane lore of interpreting Stop screens. I usually start with the name or number and look up the error in one of the many resources available, which I'll discuss in a moment. But first make sure you write down this information, so that you can look it up later.

Tip #2: What Did I Do?

Before I start running to Google to look for information about the error, I first try to pause and think, "What have I done recently that could have caused this problem?" Since most Stop screens are caused by flaky drivers or hardware problems, I ask myself questions like these:

Have I installed any new devices recently? Were they on the HCL?

Have I updated any device drivers since the last time I rebooted?

Could I have kicked the machine with my foot and possibly unseated some card from its socket?

Has the machine's hard drive been making funny noises lately?

Did I install any new service packs, hotfixes, or other applications in the last few days?

Did I make any changes to the configuration of the operating system or my BIOS settings recently?

If you ask yourself questions like these, you may think of something that may have caused the Stop error to occur, and armed with that fact and the information you've copied down, you're ready to proceed with troubleshooting the error.

Tip #3: Check Your Event Log

Next, I usually simply try rebooting the machine if it isn't already configured to reboot itself when a Stop error occurs. To configure auto-reboot after a Stop error, select Control Panel -> System -> Advanced -> Startup and Recovery -> Settings. This opens the Startup and Recovery screen (Figure 1), where you can configure boot options and what you want to happen when a system failure occurs.

Figure 1. Startup and Recovery screen in Windows XP.

In Windows XP, when the operating system encounters an exception it can't handle, the following actions occur:

An event describing the error is written to the System log in the Event console.

An administrative alert is sent over the network if the Alerter and Messenger services are running (these are disabled by default).

A small memory dump of key information (including the Stop error info) is performed into the %SystemRoot%\Minidump directory.

Windows Server 2003 is configured a little differently, and by default performs a complete memory dump, which dumps everything in RAM into the file %SystemRoot%\MEMORY.DMP, so that a support technician can analyze it, if needed. What actually happens is that when the system fails, the kernel writes the contents of RAM to the paging file (Pagefile.sys). Then, when the system reboots, Windows extracts information from Pagefile.sys to create MEMORY.DMP. If your paging file is smaller in size than physical memory however, no MEMORY.DMP file will be created. And if you have a lot of RAM, the memory dump can take a long time to occur. Of course, you need enough free disk space if the dump file is to be saved. For these reasons, I usually prefer configuring servers to perform minidumps by default and only change these to kernel memory dumps or complete dumps if the failure keeps on recurring. (Windows Server 2003 always performs minidumps anyway, regardless of the configuration).

If you forgot to copy down the Stop message or the machine rebooted before you had a chance, rebooting the machine and checking the System log will usually give you a way of viewing the error message, and it may even tell you additional information that could be of help in troubleshooting the message. Of course, if you can't reboot the machine or it dies again before logon, then you're stuck. Almost, that is -- if you've configured the machine to perform a memory dump (the default), then you still have an out.

Tip #4: Reading Memory Dump Files

In some cases a system might die without writing an event to the System log. In that case, if you have Windows configured to create a dump file, you can use the Microsoft Debugging Tools for Microsoft Windows to manually analyze the dump file and extract the error information you need to troubleshoot the failure. You can download these tools from Microsoft's Windows Hardware and Driver Central (WHDC) web site and install them on your system. It takes a bit of work getting used to working with these tools, but as an added bonus, the Help file included with the tools has a complete reference for every possible kind of Stop message with suggestions on how to troubleshoot them.

Tip #5: Do the Obvious

At this point you're probably saying, "Gaah! I never wanted to be a kernel technician!" And you're right -- you probably don't need to become one in order to resolve your problem. Go back and reflect on the answers you gave to the questions you asked earlier. Using these, together with the basic Stop message info you copied down earlier or read from the System log, you can probably fix your problem by taking a corrective action such as:

Uninstalling the last device you installed.

Rolling back a device driver you updated recently.

Opening up your box and reseating PCI cards and memory.

Running diagnostics on your hard drives.

Uninstalling a recently installed hotfix or service pack.

Rolling back the configuration of your system using Last Known Good or XP's System Restore.

Reconfiguring your BIOS settings to their original values.

But if worse comes to worst, you can always contact Microsoft Product Support Services (PSS) and talk to a technician. While there will normally be a charge incurred, the cost may be waived if the problem is a new one or can't be resolved by existing information in the Knowledge Base.

Additional Resources

Here are some additional online resources to help you troubleshoot Stop messages: