Tag - python

Introduction

Your Python program is crashing or unresponsive and you can't figure it out with traditional tools (printing, PDB) ? Then this tutorial might help you !

GDB is a very powerful debugging tool, but it's hardly intuitive to use and moreover it doesn't understand Python data structures. With no hacking it will
only be able to print the system stack and addresses of some variables whereas what you need is the Python stack and Python local variables in a human
readable form.

To start, install GDB and a debug build of your Python interpreter (python2.x-dbg on Debian-like systems).

This last file contains a few macros for GDB that will enable it to print Python locals and stack.

To avoid all confusion : the most recent call comes first in this trace unlike when the backtrace is printed from Python.
In GDB, the most recent call is called active or selected. We can print Python code and local variables in the selected frame :

At this point GDB has the same behavior as PDB, it is good but not so helpful.
If you were unable to figure it out with PDB, then you're probably dealing with some low-level
problem in some Python internal, external lib or system call. To understand what happens you will
need to explore the system stack :

We can now see that our program is stuck in a call to select(), in the libc (you might not actually see exactly where the last call was made unless you have a debug version of that external library).
Now you should probably use GDB commands finish and return to see if the execution thread comes back into the Python interpreter. If not, it's probably a bug with an external library which should be reproducible outside of Python.

Hard case

You might not be able to trigger systematically the bug which may be happening like once a day on one of your production servers. In this case we absolutely need to perform the analysis right on the production server where you found the unresponsive process.
As this process is running on an optimized and stripped version of the Python interpreter, the stack trace will give you very few info :

Only public symbols of libpython are visible, we absolutely don't know where we are in the Python script and we have no idea of the Python stack.
Let's install the debug version of Python, it will at least install GDB symbols for the Python interpreter :

It is better, we now know the module and the file, but nothing about local variables or Python stack.
Do not try to use py-* macros, they will not work as almost all Python internals are "optimized out", they will probably
trigger a segmentation fault by trying to print Python objects with _PyObject_Dump.

The only chance you have to find exactly where the code is failing is by carefully inspecting all the internal Python variables, some of them are still usable and can be used to find out what's going on. For example :

Frame 2 was a call in timemodule.c and showed us that the argument of the function call was 10 secs.

Frame 3 is in PyEval_EvalFrameEx() (main Python bytecode interpretation routine) it brings us back into the interpreter. Almost all local variables were optimized, func tells us that the function call was for the function sleep. Finally :

Insanely hard case

If this those steps are still insufficient, you might try to set breakpoints on call_function() and let the script run a little bit with continue or step.

The final and ultimate solution is to run PyEval_EvalFrameEx() step by step. Grab the source of CPython and go to the Python directory before launching GDB (it must be the source of that exact same version of the Python interpreter that runs your script) :

Notice that it doesn't work until you cd to the Python directory of CPython source tree. Same thing if you want to debug step by step some Python module, like gevent, you will need the source code of the very same version that's running the script.

It is very time-consuming and you'll probably need a Python bytecode reference to follow what's going on but you'll eventually find the issue.

Conclusion

Even with a strongly optimized and stripped Python interpreter it is possible to debug or at least analyze a buggy Python script.

Test suite

In order to study it and measure its performance we (me and Hartok) decided to implement the Sleep Sort in Python and gevent.

Our program generates a set of random numbers and sorts it with Sleep sort then with the Python built-in Quick sort and records timings.
It works with three parameters:

set size

maximum value in the set

the ratio between sleep time and values in the set (for example with 10 the thread for 42 sleeps for 42/10 seconds)

This last parameter is pretty tricky: the higher it is, the faster the Sleep sort works. But if it is too high, the time to start a new thread (or greenlet) will be more important than the sleep time for low values and the result will be incorrect. It seems to be strongly hardware dependent and a little tied to the other parameters.