Labels

So I've got a multi-threaded application and suddenly I notice there's one thread running away and using all CPU. Not good, probably a loop gone wrong. But where? One way to find this is revert history in the VCS and keep trying it out till you find the bad commit. Another way is to find out which thread is doing this, this is of course much more fun!

Using ps -p PID -f -L you'll see the thread ID which is causing the problems. To relate this to a Python thread I subclass threading.Thread, override it's .start() method to first wrap the .run() method so that you can log the thread ID before calling the original .run(). Since I was already doing all of this apart from the logging of the thread ID this was less work then it sounds. But the hard part is finding the thread ID.

Python knows of a threading.get_ident() method but this is merely a long unique integer and does not correspond to the actual thread ID of the OS. The kernel allows you to get the thread ID: getid(2). But this must be called using a system call with the constant name SYS_gettid. Because it's hard to use constants in ctypes (at least I don't know how to do this), and this is not portable anyway, I used this trivial C program to find out the constant value: