Kernel-Mode Debugging

The high-level requirements for kernel-mode debugging are similar to those of user-mode debugging, including the ability to control the target (break in, single-step, set breakpoints, and so on) and also manipulate its memory address space. The difference in the case of kernel-mode debugging is that the target is the entire system being debugged.

Architecture Overview

Just like in the case of user-mode debugging, the Windows operating system also designed an architecture that answers the system-level needs of kernel debuggers. In the case of user-mode debugging, that support framework is built right into the OS kernel, where the debug port executive object provides the key to the interprocess communication channel between the debugger and target processes. In the case of kernel debugging, the kernel itself is being debugged, so support for the communication channel is built lower in the architectural stack. This is done using Hardware Abstraction Layer (HAL) extensions that implement the low-level transport layer of the communication channel between the host and target machines during kernel debugging.

There are different transport mediums you can use to perform kernel-mode debugging, and each one of them is implemented in its own transport DLL extension. In Windows 7, for example, kdcom.dll is used for serial cables, kd1394.dll is used for FireWire cables, and kdusb.dll is used for USB 2.0 debug cables. These module extensions are loaded by the HAL very early during the boot process, when the target is enabled to support kernel-mode debugging. Because these modules sit very low in the architecture stack, they can’t depend on higher-level OS kernel components that might not yet be fully loaded or otherwise turn out to be themselves in the process of being debugged. For that reason, the KD transport extensions are fairly lightweight and interact directly with the hardware at the lowest possible level without taking any extra device driver dependencies, as demonstrated in Figure 3-5.

If you disregard for a second how the debugger commands are transmitted from the kernel debugger to the target, the conceptual model for how the kernel on the target processes the commands sent by the kernel-mode debugger is quite similar to how debug events are processed by the user-mode debugger loop:

The OS kernel periodically asks the transport layer (as part of the clock interrupt service routine) to check for break-in packets from the host debugger. When a new one is found, the kernel enters a break-in loop where it waits for additional commands to be received from the host kernel debugger.

While the system on the target machine is halted, the break-in loop checks for any new commands sent by the host kernel debugger. This enables the kernel debugger to read register values, inspect or change memory on the target, and perform many other inspection and control commands while the target is still frozen. These send/receive handshakes are repeated until the host kernel debugger decides to leave the break-in state and the target is instructed to exit the debugger break-in mode and continue its normal execution again.

In addition to explicit break-in requests, the kernel can also enter the break-in loop in response to exceptions that get raised by the target machine, which allows the debugger to intervene and respond to them. This generic handling of exceptions is again used to implement single-stepping and setting code breakpoints inside the target OS during kernel-mode debugging.

Setting Code Breakpoints

Knowing how code breakpoints are implemented during kernel-mode debugging is important so that you can understand situations when you fail to hit breakpoints you insert using the host kernel debugger. There are many similarities between how code breakpoints are internally implemented in user-mode and kernel-mode debugging, but there are also several important differences.

Like in the user-mode debugging case, code breakpoints are also inserted by overwriting the target virtual memory address with the debug break CPU instruction (int 3). When the target machine hits the inserted breakpoint, a CPU interrupt is raised and its OS interrupt handler is invoked. Where things diverge between user-mode and kernel-mode debugging is in how the handler dispatches the exception event to the host debugger. In the kernel-mode debugging case, the target OS is halted and enters the break-in send/receive loop, allowing the host debugger to handle the breakpoint by putting the initial byte back in the breakpoint’s code location before entering the break-in state.

Another way that kernel debugging code breakpoints are different from their user-mode debugging counterparts is that they might refer to memory that has been paged out to disk on the target machine. In that case, the target simply handles the breakpoint command from the host debugger by registering the code breakpoint as being “owed.” When the code page is later loaded into memory, the page fault handler (nt!MmAccessFault) in the kernel memory manager intervenes and inserts the breakpoint instruction to the global code page at that time, just as it would have done if the breakpoint had been in a memory location that wasn’t paged out at the time of the debugger break-in.

Finally, because the same user-mode virtual memory address can point to different private code depending on the user-mode process context, code breakpoints inserted during kernel debugging are always interpreted relative to the current process context. This is a point that sometimes escapes developers who are new to kernel debugging because it isn’t a concern in user-mode debugging. However, this is precisely the reason why you should always invasively switch the process context in the host kernel debugger to the target process before setting breakpoints in user-mode code relative to that process.

Single-Stepping the Target

Single-stepping the target in the host debugger is implemented using the same single-step CPU support and interrupt (int 1) that enables you to single-step the target process in a user-mode debugging environment. However, the fact that kernel-mode debuggers have global scope again introduces some interesting side effects you should be aware of so that you are better prepared to deal with them during your kernel-debugging experiments.

The most practical difference you’ll see when you try single-stepping the target in a host kernel debugger is that execution sometimes seems to jump to other random code on the system and away from your current thread context. This happens when the thread quantum expires while stepping over a function call and the OS decides to schedule another thread on the processor. When that happens, it seems as if the code you’re debugging just jumped to a random location. In reality, what happened is that the old thread got switched out and a new one is now running on the processor. This usually happens whenever you step over a long function or a Win32 API call that causes the thread to enter a wait state (such as a Sleep call). Fortunately, when single-stepping in a host kernel debugger, the target OS not only enables the CPU trace flag but cleverly also finds the next call and inserts an additional debug break instruction at that memory location every time you single-step. This means that by letting the target machine “go” again (using the g command) after it seemed you had jumped to an unrelated code location, you break right back at the next call from the original thread (once its wait is satisfied and the thread gets scheduled to run again), which allows you to continue single-stepping the thread you were examining prior to the context switch.

Switching the Current Process Context

There are two ways to resolve symbols for the user-mode stacks of a process on the target machine of a kernel-debugging session. The first way, which you already used in Chapter 2, is to simply switch the current process view in the host debugger and reload the user-mode symbols for that process. This method’s main advantage is that it also works in live kernel-mode debugging, where it proves useful when you need to observe multiple user-mode processes during a debugger break-in. In the following live kernel-debugging session, the .process command is used with the /r (“reload user-mode symbols”) and /p (“target process”) options to illustrate this important approach. Make sure you start a new notepad.exe instance and that you use the values in bold text when you execute these commands because your values are likely to be different from the ones shown in this listing.

The second way to switch process views in the host debugger is to perform an invasive process context switch on the target machine by using the /i option of the .process command. This method is particularly useful when you need to set breakpoints in user-mode code locations, given they’re always interpreted relative to the current process context on the target machine, as you also learned back in Chapter 2. This method requires the target machine to exit the debugger break-in mode and run to complete the request.

After the target is let go by the host debugger, the kernel on that side thaws the frozen processors and exits the break-in loop. Before it does so, however, it also schedules a high-priority work item to transition over to the new process context that was requested by the host debugger.

The work item that induced the previous debug break runs on a leased system thread that runs in the context of the requested process. The host debugger breaks right back in again before any of its threads have a chance to continue executing past where they were at the time of the original break-in. You can also confirm that the current thread context is a kernel thread, and not a thread from the user-mode process itself. Notice that thread is indeed owned by the system (kernel) process, which always has a PID value of 4, as reported by the Cid (client thread ID) you get from the !thread command.

Nevertheless, this system thread is attached to the target process you requested, which you can confirm using the !process kernel debugger extension command and –1 to indicate you would like the current process context displayed.