Easier Python Debugging

Summary

Not yet implemented!This section is written based on what the desired outcome is, rather than the current status

The gdb debugger has been extended so that it can report detailed information on the internals of the Python 2 and Python 3 runtimes. Backtraces involving Python will now by default show mixed C and Python-level information on what such processes are doing, without requiring expertise in the use of gdb.

We believe this ability is unique to Fedora, and will be valuable for Python developers seeking additional visibility into their CPython processes.

Still to be done: integrate so that the libpython hooks are run automatically by gdb

I was stuck on this issue when getting at "PyFrameObject *f" from the current frame, but in my current implementation I've sidestepped this by simply writing a pretty-printer for PyFrameObject* which gdb successfully invokes during a backtrace.

Detailed Description

We ship Python wrappers for numerous libraries implemented in C and C++. Bugs (either in the libraries themselves, or in the usage of those libraries) can lead to complicated backtraces from gdb, and it can be hard to figure out what's going on at the python level.

Walking through the stack frames, going up from the bottom (textually), or down from the top (numerically):

frames 26 and below show a pygtk application starting up.

An event comes in frame 24/25, and is dispatched into pulsecore (frames 23->18; pstream_packet_callback, pa_context_simple_ack_callback) which:

calls a Python callback (down to frame 15),

...which invokes python code down to frame 3.

...where it calls back into native code; whereupon the segfault happens, calling Py_DecRef on some object pointer.

Note that as it stands, all we see from the backtrace is that python code was run: we have no way as-is of telling what that python code was.

In the above example, it happens that there is a bug in the application's Python code, which is sufficiently serious to cause a SIGSEGV error. This example uses the ctypes module, which is designed to expose machine-level details. It's fairly easily to write a one-liner of python code using this module which causes the python process to immediately fail with either a SIGSEGV or SIGABRT.

When using "native" C/C++ libraries, it's sadly common for bugs in the library to leads to SIGSEGV errors that immediately cause the whole python process to terminate. Beyond that, poorly-designed error-handling in such libraries uses assert() or abort() at the C level, which immediately terminates the entire process. It's useful to be able to determine what was "really" going on when this happens.

A trickier problem is when a threading assertion fails: many libraries make assumptions about threads and locks, and allow the programmer to register callbacks, but imposes conditions upon the kind of code run in those callbacks. When the threads and callback-registration hooks are wrapped at the python level, these conditions continue to be required at the Python level, but mistakes here often lead to low-level error-handling that's difficult to debug.

For example, the GTK widget library requires that all communication with the X server happen within a GDK lock, to avoid garbling the single "conversation" between the process and the X server. The common way to implement this in a multi-threaded application is to restrict all calls to GTK to a single "primary" thread. See attachment 379251 to rhbug:543278 bug 543278 for an example of where a secondary thread in an application violates this, which leads to a low-level gdk_x_error() failure in the main thread: frames 16 to 28 of this backtrace are running Python code, but it's not at all clear from the backtrace _what_ said code is actually doing.

Current state-of-the-art for debugging CPython backtraces

Python already has a gdbinit file with plenty of domain-specific hooks for debugging CPython, and we ship it in our python-devel subpackage. If you copy this to ~/.gdbinit you can then use "pyframe" and other commands to debug things, and figure out where we are in Python code from gdb. I used it when deciphering the example backtraces referred to above.

Unfortunately:

this script isn't very robust; if the data in the "inferior" process is corrupt, attempting to print it can lead to a SIGSEGV within that process

you have to go into gdb manually and run these commands by hand, and it's hard to do this correctly; any mistakes when doing this will typically cause a SIGSEGV in the inferior process; see e.g. bug 532552

the script is written in the gdb language and is thus hard to work with and extend

Proposal

gdb should provide rich information on what's going on at the Python level automatically. I plan to hook this in using gdb-archer, and make it automatic:

Benefit to Fedora

Backtraces from gdb (such as those from ABRT) that involve python code will show what's going on at the Python level, as well as at the C level. This will make it much easier for developers to read backtraces when a library wrapped by python encounters a bug (e.g. PyGTK)

For python developers, it should be possible to attach to a running python process using gdb, then run thread apply all backtrace to get an overview of all C and Python code running in all threads within that process - I believe this ability would be unique to Fedora, and be valuable for Python developers seeking additional visibility into their CPython processes.

Scope

This will require extensions to the python srpm, and analogous changes to the python3 srpm.

It may well require co-ordination with the gdb srpm (such as API changes), and with the glib2 changes written by Alex referred to above.

Note that before in frame 5, gdb merely reported argtuple=0xb7f3d02c. It is now able to tell us that we have a (long, int) 2-tuple: (3735928559L, -1) (this is actually (0xDEADBEEF, -1) but it has no way to know what base you want the number in).

Similarly, gdb is now telling us the types of the various objects. For example, in the baseline backtrace in frame 5 gdb merely reported restype=0x80f3dc4, but with this visualizer it is now able to tell us we have restype=<_ctypes.SimpleType at remote 0x80f3dc4>.

In the above frames, notice how gdb is now able to tell us that this instance of an old-style class is of type "Foo" and the current values of its attributes (I deliberately picked a mixture above in order to show support for dictionaries, lists, tuples, ints, longs etc).

Dependencies

This feature will require coordination with, and possible changes in, the gdb, and glib2 packages.

Contingency Plan

The contingency plan would be to remove the additional .py files, deactivating the feature.

Documentation

See the "Detailed Description" section above; this feature page contains much information.

Release Notes

Not yet implemented!This section is written based on what the desired outcome is, rather than the current status

Python: the gdb debugger has been extended so that it can report detailed information on the internals of the Python 2 and Python 3 runtimes. Backtraces involving Python will now by default show mixed C and Python-level information on what such processes are doing, without requiring expertise in the use of gdb

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, and JBoss are trademarks or registered trademarks of
Red Hat, Inc. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.
The Fedora Project is maintained and driven by the community and sponsored by Red Hat. This is a community
maintained site. Red Hat is not responsible for content.