a cosmological pedestrian

Hunting down a fd closing bug in Samba

In Samba I had a failing test suite. I have nss_wrapper compiled with debug messages turned on, so it showed me the following line:

NWRAP_ERROR(23052) - nwrap_he_parse_line: 3 Invalid line[TDB]: 'DB'

The file should parse a hosts file like /etc/hosts, but the debug line showed that it tried to parse a TDB (Trivial Database) file, Samba database backend. I’ve started to investigate it and wondered what was going on. This morning I called Michael Adam and we looked into the issue together. It was obvious that something closed the file descriptor for the hosts file of nss_wrapper and the samba binary opend a different file getting the same fd number assigned. The big question was, what the heck closes the fd. As socket_wrapper was loaded and it wraps the open() and close() call we started to add debug to the socket_wrapper code.

So first we added debug statements to the open() and close() calls to see when the fd was opened and closed. After that we wanted to see a stacktrace at the close() call to see what is the code path were it happens. Here is the code how to do this:

We found out that the code responsible for this created a pipe() to communitcate with the child and then forked. The child called close() on the second pipe file descriptor. So when another fork happend in the child, the close() on the pipe file descriptor was called again and we closed a fd of the process to a tdb, connection or something like that. So initializing the pipe fd array with -1 and only calling close() if we have a file description which is not -1, fixed the problem.

If you need a better stacktrace you should use libunwind. However socket_wrapper can be a nice little helper to find bugs with file descriptors 😉