Infinity

I’m writing a replacement for libthread_db. It’s called Infinity.

Why? Because libthread_db is a pain in the ass for debuggers. GDB has to watch for inferiors loading thread libraries. It has to know that, for example, on GNU/Linux, when the inferior loads libpthread.so then GDB has to locate the corresponding libthread_db.sointo itself and use that to inspect libpthread’s internal structures. How does GDB know where libthread_db is? It doesn’t, it has to search for it. How does it know, when it finds it, that the libthread_db it found is compatible with the libpthread the inferior loaded? It doesn’t, it has to load it to see, then unload it if it didn’t work. How does GDB know that the libthread_db it found is compatible with itself? It doesn’t, it has to load it and, erm, crash if it isn’t. How does GDB manage when the inferior (and its libthread_db) has a different ABI to GDB? Well, it doesn’t.

libthread_db means you can’t debug an application in a RHEL 6 container with a GDB in a RHEL 7 container. Probably. Not safely. Not without using gdbserver, anyway–and there’s no reason you should have to use gdbserver to debug what is essentially a native process.

So. Infinity. In Infinity, inspection functions for debuggers will be shipped as bytecode in ELF notes in the same file as the code they pertain to. libpthread.so, for example, will contain a bunch of Infinity notes, each representing some bit of functionality that GDB currently gets from libthread_db. When the inferior starts or loads libraries GDB will find the notes in the files it already loaded and register their functions. If GDB notices it has, for example, the full set of functions it requires for thread support then, boom, thread support switches on. This happens regardless of whether libpthread was dynamically or statically linked.

(If you’re using gdbserver, gdbserver gives GDB a list of Infinity functions it’s interested in. When GDB finds these functions it fires the (slightly rewritten) bytecode over to gdbserver and gdbserver takes it from there.)

Concrete things I have are: a bytecode format (but not the bytecode itself), an executable with a couple of handwritten notes (with some junk where the bytecode should be), a readelf that can decode the notes, a BFD that extracts the notes and a GDB that picks them up.

What I’m doing right now is rewriting a function I don’t understand (td_ta_map_lwp2thr) in a language I’m inventing as I go along (i8) that’ll be compiled with a compiler that barely exists (i8c) into a bytecode that’s totally undefined to be executed by an interpreter that doesn’t exist.

(The compiler’s going to be written in Python, and it’ll emit assembly language. It’s more of an assembler, really. Emitting assembler rather than going straight to bytecode simplifies things (e.g. the compiler won’t need to understand numbers!) at the expense of emitting some slightly crappy code (e.g. instruction sequences that add zero). I’m thinking GDB will eventually JIT the bytecode so this won’t matter. GDB will have to JIT if it’s to cope with millions of threads, but jitted Infinity should be faster than libthread_db. None of this is possible now, but it might be sooner than you thing with the GDB/GCC integration work that’s happening. Besides, I can think of about five different ways to make an interpreter skip null operations in zero time.)

My preference right now is DWARF, for a number of reasons. I’m moderately familiar with it, most debuggers (and tools like readelf) already have parsers for it, and I think it will go down well with the glibc developers.

One point of the original libthread_db design was to isolate the debugger from the details of the libpthread implementation, and it does a good job at that. The debugger provides the required callbacks (ps_pglobal_lookup, ps_pdread, etc.), and can then use a well defined libthread_db API to extract the information it needs. The thread_db API was “open” and the implementation was “closed”. The idiom of a debug DLL paired with a library implementation has worked well for many years, and duplicated in other interfaces such as the MPI Message Queue Dumping interface. In practice, TotalView has not had the same problems locating the correct libthread_db to use that the blog post claims GDB has had.

My opinion is that you can’t get rid of libthread_db without breaking compatibility for a lot of consumers. You must continue to support libthread_db otherwise older versions of the consumers will not work on newer systems. This would create a big problem for ISV’s and their customers. I don’t care how libthread_db is implemented — use the byte code scheme or whatever else you can dream up — but the existing libthread_db API and callbacks must remain intact. I think it’s OK if a byte-code based implementation of libthread_db looks up additional symbols to read the byte-code out of the target process, bu the details of how to parse and interpret the byte-code should be encapsulated inside the libthread_db implementation.

GDB and other consumers can do whatever they want: continue to use libthread_db or extract/interpret the byte-code directly. But any solution that removes libthread_db from the system pulls the rug out from under some unknown number of other tools. Extracting/interpreting the byte-code directly should be optional, not mandatory, and the decision should be made on a tool-by-tool basis.

I misinterpreted the meaning of the word “replacement” in the first sentence of your post where you say, “I’m writing a replacement for libthread_db. It’s called Infinity.”. I assumed “replacement” implied removing libthread_db.

So, if I understand you correctly, you are writing an “alternative” to libthread_db, and libthread_db library and the current API it implements will continue to exist for the foreseeable future. Correct?