Remote Debugging of Loadable Kernel Modules with kgdb: a Knowledge-based Article for Getting Started

As many kernel developers and hackers
have known for years, loadable/unloadable kernel modules (like
user-space applications) are almost never bug-free. With the
continuing use and development of loadable modules growing, due in
fact to the obvious benefits of the mechanism (lean kernels,
reduction of kernel recompiles/reboots, etc.), developers are in an
increasing need for robust debugging tools capable of aiding in the
identification of problem code. Traditionally, module developers
have used various debugging techniques to help identify problematic
code. These techniques have included:

printk statements around suspected areas of failure
(probably the most useful)

While these methods are relatively useful, they may not be dynamic
enough for pinpointing module failure/problems in all situations
(consider tricky device driver resource allocation/deallocation,
file operations, etc.). In fact, there may be many instances where
coders benefit from the ability to perform ordinary
application-style debugging on kernel modules. But unfortunately it
is not inherently possible to single-step kernel code as in an
ordinary application. However, there does exist a tool to help
developers in obtaining this functionality. That tool is
kgdb.

What Is kgdb?

kgdb is a kernel patch that,
once applied, allows for the use of the familiar
gdb interface for source-level
debugging of a running Linux kernel. The process requires the use
of two machines. One machine runs the kernel being debugged while
the other runs the gdb session. Communication between the running
kernel and gdb transpires via a serial cable connecting the two
machines.

The kgdb patch supplies the kernel with a debugging stub.
This stub uses the gdb remote serial protocol to communicate with
gdb through a serial driver interface (also supplied by the patch).
This patch is applied to the kernel on the machine that will run
the gdb session (the development machine) where it is recompiled.
The newly compiled boot image is then copied to the other machine
(the target machine) where it is configured as the bootable kernel.
When a reboot into the transferred kernel is complete, the target
machine can then be configured to halt and await a remote
connection from a gdb session on the development machine. When this
connection is established, the target machine's kernel can then be
debugged (single-stepping, issuing of breakpoints, data
examination, etc.) through gdb on the development machine as if it
were a user-space application.

Configuring kgdb

The first step is to download the kgdb patch for your kernel
version. A patch can be obtained at
http://kgdb.sourceforge.net/.
As of this writing, patches only exist for the following
kernels:

2.4.0-test9

2.4.0-test4 (kernel used for this article)

2.4.0-test1

2.3.99-pre6

2.2.17

2.2.12

Once you have obtained the patch, copy the patch to the
kernel source directory on the development machine, and apply.

patch -p1 < patchfile

(remember, this is the kernel that will eventually turn on the
target machine)

The patched kernel must now be recompiled. It is assumed here
that /usr/src/linux/.config exists and accurately reflects your
current kernel configuration. Navigate to the source directory (if
you aren't there already), and do a make
menuconfig. From the main menu navigate to and select
kernel hacking. You should now see an option for
Remote (serial) debugging with gdb. Make sure
this option is selected and then exit, saving your configuration.
Next, do a make clean followed by a
make bzImage (or whatever image you usually
make).

The recompile adds a documentation file called gdb-serial.txt
to your system. This file can be found in
/usr/src/linux/Documentation/i386 and includes a step-by-step
description of what needs to transpire next. Basically, here are
the highlights.

The newly compiled kernel image (e.g., bzImage) is copied to
the target machine where it is configured for boot. For example,
the image may be copied to /boot/vmlinuz-target (or whatever you
want to call it) followed by an added entry in lilo.conf:

On the development machine, navigate to
/usr/src/linux/arch/i386/kernel. Here you will find an executable
called gdbstart. Copy this program
to the target machine. gdbstart is responsible for configuring the
target machine's serial port (from user space) for communication
with gdb on the development machine. The program then calls a
process ioctl that activates the serial driver interface to the
debugging stub. This driver effectively halts the target system
until gdb on the development machine issues a continuance to resume
execution.

Next, decide which serial port (i.e., ttyS0 or
ttyS1) is to be used as well as a baud rate for communication
(e.g., /dev/ttyS0 with a baud rate of 38,400).

Connect the two machines with a null modem serial
cable. Be sure to connect the cable to the serial ports you have
designated in the above step.

Run the gdbstart program on the target machine with
the following parameters (or whatever port and data rate you decide
upon):

gdbstart - s 38400 - t /dev/ttyS0

The program will execute and pause, awaiting a remote connection
from the development machine.

Alternatively, the documentation suggests creating a script
on the target machine to deliberately call gdbstart with
user-defined parameters.

The documentation next instructs you to create a
.gdbinit file in /usr/src/linux on the development machine.
Included in this file is a macro (called
rmt) that is used to supply gdb
with the information it needs to initiate the remote protocol. Edit
this information to reflect the com port and data rate you have
designated for communication between gdb and the target
machine.

Now, navigate to /usr/src/linux on the development
machine, and run gdb vmlinux. Once you receive a
gdb prompt enter rmt, which informs gdb that it
is connecting to a remote target (via the serial port and data rate
specified in the .gdbinit file).

You can now issue step commands, set breakpoints, etc.
Issuing a continue to gdb will
return the target kernel to a running state. The kernel will
continue to run until it encounters a defined breakpoint, an
interrupt, a signal, a segment violation, etc., at which point
control is returned to gdb on the development machine.

How the Communication Works

In order for us to use kgdb to debug loadable modules, we
must understand how the remote kernel communicates with gdb on the
development machine. Remember, the mechanism's default
communication between gdb on the development machine and the
debugging stub on the target machine transpires via a serial driver
interface (called gdbserial.o). In order for the debugging process
to begin, two things must happen from within this driver. First,
the set_debug_traps( ) is initiated. This function is defined in
the debugging stub and informs the remote kernel that all
breakpoints, error conditions and other exception handling is to be
intercepted and handled by gdb. Secondly, the serial driver must
call the function breakpoint( ). This function is also defined in
the debugging stub and is used to initiate the communication by
issuing a breakpoint interrupt:

asm( " int $3");

Since gdb is now configured to intercept such a condition
(i.e., the set_debug_traps( ) call), the kernel on the target
machine halts and transfers control to gdb on the development
machine. It is from this point that the user may begin normal
debugging such as single-stepping, issuing of breakpoints, stack
tracing, etc. However, if we were to begin stepping from this
initial point, the code we would be examining would be in gdbserial
immediately following the call to the debugging stub's breakpoint(
) (since this is where program execution has halted).

For example, Listing 2 is the excerpt of gdbserial in which
the two calls to the previously explained stub functions are
called.

Now, if we initiate another debug session and issue a
step command after gdb receives
the stub's breakpoint interrupt, we would step to the next line of
code in gdbserial after the breakpoint( ) is made (which should be
gdb_null in the example in Listing 3).

As was mentioned before, gdb can return the remote kernel to
a running state by issuing a continue command. If this is done, gdb
patiently waits until the remote kernel returns control by issuing
some sort of exception (such as a user-defined breakpoint,
segmentation fault, etc.).

For a better understanding of how the entire process works
review the debugging stub code found in
/usr/src/linux/arch/i386/kernel/gdbstub.c and the serial driver
interface, which can be found in
/usr/src/linux/drivers/char/gdbserial.c.

Preparing to Debug a Loadable Module on the
Target Machine

We now have almost all the information we need in order begin
using the kgdb mechanism to debug loadable modules. Remember, the
important information we must retain from kgdb's communication
process in order to initiate module debugging is: 1) at the
beginning of the debug session, the debugging stub informs gdb on
the development machine that it is responsible for intercepting and
handling all exceptions from the remote target kernel for example,
gdbserial's initial call to set_debug"traps( ); 2) the initial
debug process begins with the serial interface's call to the
debugging stub's breakpoint( ) function; 3) gdb can return the
remote kernel to a running state but will regain control once the
remote kernel issues any type of exception.

Additionally, we must consider that because the module will
be loaded on the target machine and the gdb session runs on another
machine, gdb will have no idea where in the target machine's memory
the module code will be loaded. We must therefore determine this
location and inform gdb of its whereabouts before the debugging of
the module can begin.

Locating Modularized Code in Memory

During the kernel-building process, the kernel produces a
file that maps addresses in memory to function names for the
modules/drivers that are built during the compile process. This
file is usually placed in the root of the kernel source directory
(i.e., /usr/src/linux/System.map) and is used by the kernel to
access those compiled devices properly in the correct memory
location. However, at the construction of that file the kernel is
unaware of where in memory a particular module may be loaded at a
later time.

Fortunately, we can determine the memory location for
modularized code during its load process. This is accomplished by
using insmod with the -m parameter
that informs insmod to produce a load map. This map informs us of
where in memory the object-code sections reside. Ultimately, we
must locate this information in order to inform gdb on our
development machine of where our module's object code resides on
the target machine. To illustrate, let's consider the module code
shown in Listing 4, which we will refer to as the simple
module.

Record the hex address of the .text section
(0xc480004c in our example) for use later. This
section represents the beginning of the module code in memory. We
will use this value to inform the development machine running gdb
of the module's whereabouts in memory.

Unload the module:

rmmod simple

gdb Considerations

We are almost ready to begin debugging a module. But before
we proceed, we have to alleviate another possible issue. In gdb we
will be using the add-symbol-file
to inform the debugger of the memory location for our module's
object code on the target machine. However, gdb 5.0 and previous
versions have had problems in correctly calculating addresses using
the add-symbol-file command (the problem surrounds the issue of a
module's global variables). The problem has been corrected in
developmental versions of gdb. It is therefore recommended that you
use a developmental version of gdb to debug modules with kgdb. The
gdb version used for the remainder of this article is a
developmental gdb built for Red Hat 6.2. For more information
regarding this issue, visit
http://kgdb.sourceforge.net/.

Final Preparations

The last step we must account for is the configuration of our
module to communicate with the remote gdb session. This is a
relatively simple task. We now know that when gdb on the
development machine makes the initial contact with the target
machine, the gdbserial interface issues a call to the debugging
stub's set_debug_traps( ) function. This function, as you recall,
instructs gdb to perform all exception handling for the target
kernel. The serial interface then issues a call to the stub's
breakpoint( ) function, which turns control over to gdb. At this
point we can inform the remote kernel to resume normal operations
by issuing a continue command from gdb.

With the target kernel now configured to return control to
the remote gdb session whenever an exception is triggered, we can
modify our module code to guarantee that such an event will arise.
Consider the following modifications made to the simple module in
Listing 6.

As you can see by the code in Listing 6, we have added a
breakpoint interrupt that is to be called (in this example) when
the module is both loaded and unloaded from the kernel. These
interrupts will return control to the remote gdb session, thus
halting execution of the module at those points. Let's try it
out.

On the development machine, recompile the module
after adding the BREAKPOINT code:

gcc -c -O2 -g simple.c

Copy the newly compiled object code to the target
machine.

Initiate a remote debug session on the target
machine by running the gdbstart program.

On the development machine, navigate to
/usr/src/linux and run gdb vmlinux. Remember to
use a developmental version of the debugger as described in the
previous section.

Once prompted in gdb, type rmt
to initiate contact with the remote machine.

With the previously recorded hex address, use
add-symbol-file to instruct gdb of the modules
location in memory:

add-symbol-file /root/simple.o 0xc480004c

You may be wondering at this point how we can assign this
address space when the module is not currently loaded. Or a better
question may be, ``How do we know that the kernel will load the
module into the exact same location in memory?'' We can make this
assumption because the kernel very often will load the object code
into the same memory segment as before. While this is not written
in stone it does happen frequently enough to render this method
quite reliable.

Return control to the remote kernel by issuing a
continue command from gdb.

On the target machine, install the module:

insmod simple.o

When insmod invokes the init_module modules, our breakpoint
interrupt is called and returns control to gdb. This allows us to
step through the remainder of the init_module as if it were a
user-space application as shown in Listing 7.

Note that this sequence reflects an example of stopping the
module's load process. The module could be installed with the
insmod -m parameter again to verify memory placement of the object
code if other module functionality (other than the initialization
process) was to be debugged (e.g., file operation functions: open,
read, write, close, ioctl; driver resource allocation/deallocation,
etc.).

Return control to the remote kernel from gdb (i.e.,
continue).

On the target machine, remove the simple
module:

rmmod simple

This of course issues the cleanup_module function, which in
turn invokes another of our breakpoints, returning control to gdb
on the development machine:

Although this simple example does not actually accomplish much
(except stepping around the printk function), it does illustrate
how we can halt execution of the module for debugging via kgbd. One
could only imagine the benefits of using such a method in lieu of
traditional debugging methods (printks, Oops analysis, etc.)
especially where very problematic module code is concerned. Of
course no debugging method is perfect for all situations nor is it
a replacement for writing good code. For instance, single-stepping
around a device driver on a real-time system that depends on
precise timing may not particularly make this method the best one
for the job, but it does have useful applications and seems to have
been well received by the kernel developer community.

James Lamphere
has a BA in music history from Eastern Washington University and is
currently working on an interdisciplinary masters degree in
computer science at Eastern Washington University specializing in
operating system-level development. He is a graduate
instructor/system administrator in the computer science department
at EWU.