Edit 13 July 2013: I’ve made a couple of updates to this post to clarify a couple of things and resolve issues people have had.

fG! did a great write up here on how to set up two-machine debugging with VMware on Leopard a couple of years ago, but as a few things have changed since then and I will probably refer to this topic in future posts I thought it was worth revisiting.

Debugging kernel extensions can be a bit of a pain. printf()-debugging is the worst, and being in kernel-land, it might not be immediately obvious how to go about debugging your (or other people’s) code. Apple has long provided methods for kernel debugging via the Kernel Debugger Protocol (KDP), along with ddb, the in-kernel serial debugger. KDP is implemented in-kernel by an instance of IOKernelDebugger, and allows you to connect to the debug stub from an instance of gdb (Apple’s outdated fork only AFAIK) running on another machine connected via FireWire or Ethernet. ddb can be used to debug the running kernel from the target machine itself, but is pretty low-level and arcane. Apple suggests in the Kernel Programming Guide that you are better off using gdb for most tasks, so that’s what we’ll do.

Enter VMware

We don’t really want to use two physical machines for debugging, because who the hell uses physical boxes these days when VMs will do the job? With the release of Mac OS X 10.7 (Lion), Apple changed the EULA to allow running virtualised instances of Lion on top of an instance running on bare metal. Prior to this, only the “server” version of Mac OS X was allowed to be virtualised, and VMware ‘prevented’ the client version from being installed through some hardcoded logic in vmware-vmx (which some sneaky hackers patched). VMware Fusion 4 introduced the ability to install Mac OS X 10.7 into a VM without any dodgy hacks, just by choosing the Install Mac OS X Lion.app bundle as the installation disc.

So, the first step of the process is: install yourself a Mac OS X VM as per the VMware documentation.

Edit 13 July 2013: Once you’re done it’s probably a good idea to take a snapshot of your VM in case there are problems installing the debug kernel. Generally it’s not a problem, but it’s annoying to roll back and much easier to use a VMware snapshot.

Install the debug kernel

Once we’ve got our VM installed, we need to install the Kernel Debug Kit. This contains a version of the XNU kernel built with the DEBUG flag set, which includes the debug stubs for KDP and ddb, and a second DEBUG version with a full symbol table to load in GDB so we can use breakpoints on symbol names and not go insane. The debug kits used to live here, but it seems Apple decided they only want ADC members to be able to access them, so now they’re here (requires ADC login). Download the appropriate version for the target kernel you’re debugging in the VM (not necessarily the same as the kernel version on your host debugger machine). In this case I’m using Kernel Debug Kit 10.7.3 build 11D50. Copy this image up to the target VM, and install the debug kernel as per the instructions in the readme file:

Hopefully your VM has successfully booted with the debug kernel and no magic blue smoke has been let out.

Edit 13 July 2013: If your VM has panicked at boot time make sure you’ve allocated at least 4GB of RAM to the VM or it will not boot on newer OS X versions.

Next we need to set the kernel boot arguments to tell it to wait for a debugger connection at boot time. There are other options but, as fG! said previously, there isn’t an obvious way to generate an NMI within VMware (I haven’t really looked further into this - if there is I’d like to hear about it). In VMware Fusion 4, the proper NVRAM support means we can specify normal boot-args in NVRAM rather than the old com.apple.Boot.plist, by using the nvram utility on the target VM like this:

macvm# nvram boot-args="-v debug=0x1"

Now we’ll do a bit of config on the debug host, then reboot the VM.

Debug host config

Traditionally, two-machine debugging would either use FireWire or Ethernet. We can simulate Ethernet with the VMware network bridging.

Edit 13 July 2013: With newer versions of OS X (I’m not sure exactly when they introduced this but it definitely works on 10.8.4) you don’t actually need to do this static ARP trick any more. When the VM boots it will stop at “Waiting for remote debugger connection” after telling you its MAC and IP address. You should be able to skip the static ARP and just kdp-reattach (as below) directly to the IP address displayed here.

Now we should be able to reboot the VM and it will pause waiting for the debugger connection at the start of the boot process. It used to actually say Waiting for debugger connection… or something similar in previous kernel versions, but it seems to pause after [PCI configuration begin] on 10.7.

Fire up GDB

Now it’s time to actually start GDB and connect to the KDP debug stub. Assuming you’ve just mounted the Kernel Debug Kit dmg file, the following paths should be correct. On the debug host machine:

$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Thu Nov 321:59:02 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...

This is contrary to the instructions in the readme file for the Kernel Debug Kit, which tells you to target /Volumes/KernelDebugKit/mach_kernel with gdb. I haven’t been able to get this kernel to work correctly - symbols are not looked up properly and lots of addresses seem to be wrong, resulting in the kgmacros stuff not working, and breakpoints being set at the wrong addresses. If you load the kernel in the DEBUG_Kernel directory it works OK.

Next, source the kgmacros file - this contains a bunch of GDB macros that make dealing with kernel introspection and debugging much easier (particularly when you want to start looking at stuff like the virtual memory subsystem, and other fun stuff):

Note: if you’re attaching to a kernel running on a different arch (ie. you created a 32-bit VM on a 64-bit machine), you’ll need to use the --arch flag:

The –arch=i386 option allows you to use a system running the 64-bit kernel to connect to a target running the 32-bit kernel. The –arch=x86_64 option allows you to go the other direction.

Now we attach to the debug target machine:

gdb$kdp-reattach 10.0.0.15
Connected.

Edit 13 July 2013: If you’re using a recent OS X you can kdp-reattach to the IP address that was printed when the debug kernel paused waiting for the debugger.

You can also attach using target remote-kdp and attach 10.0.0.15. Allow the kernel to continue execution:

gdb$c

At this point the disk icon in VMware should be going blue with activity, and the VM should continue booting as normal.

Breaking into the debugger

Unfortunately, we can’t use the normal method of hitting ^C in the debugger to pause execution, so we have to rely on software breakpoints. The method fG! initially suggested was to break on tcp_connect() or something similar, so you can drop into the debugger by attempting to telnet somewhere. This proves to be a bit cumbersome in Lion with all the fancy (scary) network autodetect stuff - connections going out from agents all over the place means constantly dropping into the debugger.

The method that I have primarily used is to set a breakpoint on the kext_alloc() function. This is called once during the initialisation of a kernel extension, so it can be a reasonably useful point at which to break if you want to debug the initialisation of the kext, and a good on-demand breakpoint for general kernel memory inspection.

Edit 13 July 2013: @chicagoben pointed me at a simple method of replicating the behaviour of an NMI and dropping into the debugger using the technique in this handy kernel module.

If you’re debugging a kernel extension that you are writing yourself (or have the code for) a better method of dropping into the debugger is to put an int 3 (software breakpoint) in your code at the point you want to break, like this:

Poking around in kernel memory

Let’s check out a few neat things in memory. The start of the Mach-O header for the kernel image in memory:

gdb$x/x0xffffff80002000000xffffff8000200000: 0xfeedfacf

This is the “magic number” indicating a 64-bit Mach-O executable. The 32-bit version is 0xfeedface.

The “system verification code”:

gdb$x/s0xffffff80000020000xffffff8000002000: "Catfish "

On previous PowerPC versions of the OS this was located at 0x5000 and said "Hagfish ". Here is the corresponding assembly source from osfmk/x86_64/lowmem_vectors.s in the kernel source tree:

/** on x86_64 the low mem vectors live here and get mapped to 0xffffff8000200000 at
* system startup time
*/.text
.align 12.globl EXT(lowGlo)
EXT(lowGlo):
.ascii "Catfish "/*+0x000 System verification code */

Interestingly, that comment appears to be incorrect - 0xffffff8000200000 is where the kernel image itself starts and the stuff in lowmem_vectors.s starts at 0xffffff8000002000 as we’ve seen.

If you’re interested in kernel internals (which you probably are if you’re reading this) then you might want to have a look at the kgmacros help at this point:

gdb$help kgm
| These are the kernel gdb macros. These gdb macros are intended to be
| used when debugging a remote kernel via the kdp protocol. Typically, you
| would connect to your remote target like so:
| (gdb)target remote-kdp
| (gdb)attach <name-of-remote-host>
<snip>

Source-level debugging

Now that we’ve explored kernel memory a bit, it’s probably worth noting that you can use the kernel source for source-level debugging within GDB, or possibly even in Xcode (anybody done this?). Some of the documentation seems to be a bit out of date on this - e.g. the Kernel Programming Guide references a .gdbinit file defined in the osfmk directory (the Mach part of the kernel) which no longer exists, and previous documentation mentions creation of a /SourceCache/xnu/... directory for source-level debugging, but this trick doesn’t seem to work any more. It seems that these days the kernel debug symbol information relates only to filename and line number, not full file path, like this: