People tend to have a narrow view of the problems they can solve
using GDB. Many think
that GDB is just for debugging segfaults
or that it's only useful with C or C++ programs. In reality, GDB is an
impressively general and powerful tool. When you know how to use it,
you can debug just about anything, including Python, Ruby, and other
dynamic languages. It's not just for inspection either—GDB can
also be used to modify a program's behavior while it's running.

When we ran our Capture The Flag
contest, a lot of people asked us about introductions to that kind of
low-level work. GDB can be a great way to get started. In order to
demonstrate some of GDB's flexibility, and show some of the steps
involved in practical GDB work, we've put together a brief example of
debugging Python with GDB.

Imagine you're building a web app in Django. The standard cycle
for building one of these apps is to edit some code, hit an error, fix
it, restart the server, and refresh in the browser. It's a little
tedious. Wouldn't it be cool if you could hit the error, fix the code
while the request is still pending, and then have the request complete
successfully?

As it happens, the Seaside
framework supports exactly this. Using one of Stripe's example
projects, let's take a look at how we could pull it off in Python
using GDB:

GDB Demo Screencast

Pretty cool, right? Though a little contrived, this example
demonstrates many helpful techniques for making effective real-world
use of GDB. I'll walk through what we did in a little more detail, and
explain some of the GDB tricks as we go.

For the sake of brevity, I'll show the commands I type, but elide
some of the output they generate. I'm working on Ubuntu 12.04 with GDB
7.4. The manipulation should still work on other platforms, but you
probably won't get automatic pretty-printing of Python types. You can
generate them by hand by running p
PyString_AsString(PyObject_Repr(obj)) in GDB.

Getting Set Up

First, let's start the monospace-django server with
--noreload so that Django's autoreloading doesn't get in
the way of our GDB-based reloading. We'll also use the
python2.7-dbg interpreter, which will ensure that less of
the program's state is optimized away.

As of version 7.0 of GDB, it's possible to automatically
script GDB's behavior, and even register your own code to
pretty-print C types. Python comes with its own hooks which can
pretty-print Python types (such as PyObject *) and
understand the Python stack. These hooks are loaded automatically if
you have the python2.7-dbg package installed on
Ubuntu.

Whatever you're debugging, you should look to see if there are
relevant GDB scripts available—useful helpers have been created
for many dynamic languages.

Catching the Error

The Python interpreter creates a PyFrameObject every
time it starts executing a Python stack frame. From that frame object,
we can get the name of the function being executed. It's stored as a
Python object, so we can convert it to a C string using
PyString_AsString, and then stop the interpreter only if
it begins executing a function called
handle_uncaught_exception.

The obvious way to catch this would be by creating a GDB
breakpoint. A lot of frames are allocated in the process of executing
Python code, though. Rather than tediously continue through hundreds
of false positives, we can set a conditional breakpoint that'll
break on only the frame we care about:

Breakpoint conditions can be pretty complex, but it's worth noting
that conditional breakpoints that fire often (like
PyEval_EvalFrameEx) can slow the program down
significantly.

Generating the Initial Return Value

Okay, let's see if we can actually fix things during the next
request. We resubmit the form. Once again, GDB halts when the app
starts generating the internal server error response. While we
investigate more, let's disable the breakpoint in order to keep things
fast.

What we really want to do here is to let the app finish generating
its original return value (the error response) and then to replace
that with our own (the correct response). We find the stack frame
where get_response is being evaluated. Once we've jumped
to that frame with the up
or frame
command, we can use the finish
command to wait until the currently selected stack frame finishes
executing and returns.

Patching the Code

Now that we've gotten the interpreter into the state we want, we
can use Python's
internals to modify the running state of the application. GDB
allows you to make fairly complicated dynamic function invocations,
and we'll use lots of that here.

We use the C equivalent of the Python reload
function to reimport the code. We have to also reload the
monospace.urls module so that it picks up the new code in
monospace.views.

One handy trick, which we use to invoke git in the video and curl
here, is that you can run shell commands from within GDB.

In the above snippet, we use GDB's set command to
assign values to variables.

Alright, we now have a new response. Remember that we stopped the
program right where the original get_response method
returned. The C return value for the Python interpreter is the same as
the Python return value. And so, to replace that return value on x86,
we just have to store the new return value in a
register—$rax on 64-bit x86— and then allow
the execution to continue.

GDB allows you to refer to refer to the values returned by every
command you evaluate by number. In this case, we want
$5:

(gdb) set $rax = $5
(gdb) c
Continuing.

And, like magic, our web request finishes successfully.

GDB is a powerful precision tool. Even if you spend most of your
time writing code in a much higher-level language, it can be extremely
useful to have it available when you need to investigate subtle bugs
or complex issues in running applications.