Tracing python code

Sometimes it's helpful to watch what Python the low-level details of
what Python is doing. With an IDE like Wing IDE or Komodo you
can step through events in a GUI. But if you're a command-line sort
of person like me you can roll your own trace program using the same
run-time hooks used by those IDEs

The hook is set via the sys.settrace()
function. This takes a function which is called during any of several
events. One is the "line" event. Before the interpreter executes a
line it calls the trace function with the current frame and the string
"line". (The function gets a third argument which is None;
that fields is used by other events.

The frame contains the attribute "f_lineno" which is the line number
about to be run. Here is a simple example which only prints the
current line number. Note that the trace function returns itself. If
it doesn't then Python's trace mechanism interprets that as a request
to stop the tracing while inside the given scope.

line 11
In main
line 12
line 13
0 0
line 12
line 13
1 3
line 12
line 13
2 6
line 12
line 13
3 9
line 12
line 13
4 12
line 12
line 14
Done.

Stare at the output long enough and you can figure out what's going on
but it isn't easy. It's more helpful to also print the line. The
linecache
module is available for this but it needs to know the filename. In
this case I know all of the lines are from "traceit.py" so I'll
hard-code that into the program.

line 13: print "In main"
In main
line 14: for i in range(5):
line 15: print i, i*3
0 0
line 14: for i in range(5):
line 15: print i, i*3
1 3
line 14: for i in range(5):
line 15: print i, i*3
2 6
line 14: for i in range(5):
line 15: print i, i*3
3 9
line 14: for i in range(5):
line 15: print i, i*3
4 12
line 14: for i in range(5):
line 16: print "Done."
Done.

Much more readable. But what if the program uses another file so that
the hard-coded traceit.py doesn't work? Instead of saying
print i, i*3 I'll use print i, random.randrange(0,
10) to get a random integer in the range 0<=n<10.
I chose this because it is implemented in Python. I originally chose
random.random() but that's implemented in C so it wasn't
interesting. Here's the output:

line 14: print "In main"
In main
line 15: for i in range(5):
line 16: print i, random.randrange(0, 10)
0 line 141:
line 142:
line 144:
line 150:
line 151:
line 153:
line 166:
8
line 15: for i in range(5):
line 16: print i, random.randrange(0, 10)
1 line 141:
line 142:
line 144:
line 150:
line 151:
line 153:
line 166:
8
line 15: for i in range(5):
line 16: print i, random.randrange(0, 10)
2 line 141:
line 142:
line 144:
line 150:
line 151:
line 153:
line 166:
7
line 15: for i in range(5):
line 16: print i, random.randrange(0, 10)
3 line 141:
line 142:
line 144:
line 150:
line 151:
line 153:
line 166:
8
line 15: for i in range(5):
line 16: print i, random.randrange(0, 10)
4 line 141:
line 142:
line 144:
line 150:
line 151:
line 153:
line 166:
5
line 15: for i in range(5):
line 17: print "Done."
Done.

As you can see it isn't very helpful because it gets lines from the
wrong file.

The current filename is available from the module's globals dictionary
using the special key of __file__. Here's a trace function
that only displays the filename and line number.

The "<stdin>" is because I ran the code from Emacs in
python-mode. If I run it from the command-line I get the correct
filename.

file traceit.py line 14
In main
file traceit.py line 15
file traceit.py line 16

This means we'll need to treat the main program somewhat carefully.
What I'll do is force the correct filename if it's found to be
"<stdin>".

If you look at the filenames you'll see the ones from the random
module end with ".pyc". This means the code actually came from the
precompiled bytecode and not the original Python file. Getting the
right file means dropping the final "c"; or "o" for the ".pyo"
optimized bytecode files. For this example I don't want to see the
full filename so I'll use the module's __name__ instead.

Terry Brunck over at Daylight found a problem with my PyDaylight
library. At the end of a function after it was finished and Python
was doing garbage collection the code seemed to do nothing. Rather it
was doing something because the CPU was pegged but we couldn't figure
out what that was.

I wrote a trace function like the one above and put the following at
the end of the function that had the problem so it would be called
just before it finished:

import sys
sys.settrace(traceit)

This let me watch what the finalization code did. The problem was
tracked down to a dt_dealloc() call meaning the Daylight
toolkit was making the big demands on the CPU. That clue helped
narrowed down the problem space. It ended up being three different
things. Terry's code made heavy use of a PyDaylight function that
made Daylight streams but forgot to dealloc them once finished.
Normally that's not a problem because the toolkit keeps track of the
created streams and deletes them when the molecule is deleted. Except
in this case there were about 30,000 streams and the deallocation code
wasn't designed to handle that well.

Andrew Dalke is an independent consultant focusing on
software development for computational chemistry and biology.
Need contract programming, help, or training?
Contact me