Month: August 2005

I’ve been debugging some Nautilus crashers today. It involved decoding
backtraces, and since this is a useful thing to be able to do I
decided to do a writeup about it:

Many bugs that get reported contain backtraces, mostly thanks to
bug-buddy. However, many of these reports where made on a system where
the programs and libraries involved didn’t have debug
information. Having the reporter retry with a build that has debug
info (manually built or with debuginfo packages) help tremendously
with debugging the problem.

However, its often hard to get this, as bug-buddy reports are rarely
followed by the reporter, and the bug might be hard to
reproduce. Thus, its important to learn to read backtraces without
debug info. Such backtraces have several issues:

They contain no line number information, so you don’t know where in
a function something happened

You cannot see the values of arguments and local variables

You cannot trust the function names given in the backtrace, since
the debugger doesn’t know about static functions.

The first two issues you just have to accept, as there is no way to
extract such information. However the third issue can often be worked
around. This means you can get a mostly accurate trace of what
happened before the crash, which can help you figure out the
problem.

To decode such a backtrace you have to know how the debugger generates
the backtrace. The debugger locates the active stack frame on the top
of the stack by looking at a register. Each such frame contains a
pointer to the invoking frame, plus the address where execution should
continue when that frame returns. Using these addresses, plus the current
instruction pointer, the debugger can figure out which function was
executing. There are two problems though:

If the last thing function foo() does is call function bar() and
return its return value (or bar() returns void) the compiler can do an
optimization so that the return from bar() immediately returns to the
function that called foo(). This means such functions will not be
visible in the backtrace.

The way gdb figures out what function is executing is by looking at
the program/library symbol tables, combined with knowledge about where
in memory the code was loaded. The last function symbol before the
executing address is selected. However, in our case the static
functions are not in the symbol table, so the result is the nearest
non-static function before the actual function.

Armed with this knowledge and the code for the application you can often
figure out what functions were actually called. Its important that the
code you look at is about the same version as the reporter, since
changes to the code affect the result you get.

As an example, let me take bug
302096, a nautilus crasher bug that was recently reported. There
are multiple duplicates, all without debug info, and with very vague
reports of how this actually happened.

Frame 0-3 is just the crash and bug-buddy handling it, so we ignore
those. Frame 4 tells us the crash was likely a NULL pointer or an
invalid pointer passed to some gobject type check. The interesting
parts start at Frame 5.

Looking at the code we see that bothnautilus_window_open_location_full() andnautilus_window_info_open_location() are followed immediately by
non-static functions. Also, nautilus_window_info_open_location() callsnautilus_window_open_location_full(), so these are probably right.
However, fm_directory_view_confirm_multiple_windows() is followed by
multiple static functions, and it doesn’t callnautilus_window_info_open_location(). We then search for anautilus_window_info_open_location() call below but before the next
non-static function. Fortunately we only get one hit, open_location().
Doing the same with #8, fm_directory_view_notify_selection_changed() shows
that this must be activate_callback().

#9 is a bit trickier since activate_callback() is a callback
function and won’t be called immediately. However, its only used in
one place where its passed as callback tonautilus_file_call_when_ready(). So, we start fromnautilus_directory_add_file_monitors() and look for a callback that
would result from such a call. There are not many functions to choose
from, and obviously the call must be from ready_callback_call().
#10 is found out to be call_ready_callbacks() by a simple search. #11
has no non-statics, so it must be right.

#12 is harder, it could be
right, since nautilus_directory_force_reload_internal() does callnautilus_directory_async_state_changed(), but there are no less than
11 other such calls before the next non-static function. Here we have
to use our knowledge of the code, and the other information that the
bug reporters gave about what they were doing at the time of the
crash. One way forward is to just guess which call was right and work
from that. If you then get a backtrace that makes no sense you know
you picked wrong.

In the bug you can see that I initially guessed that #12 wasnautilus_directory_force_reload_internal() (although I now believe
this to be wrong). #13 is in gnome-vfs, which doesn’t callnautilus_directory_force_reload_internal(), but there could be one or
more hidden stack frames here, so I greped the code for calls and
found nautilus-vfs-directory.c::vfs_force_reload() as the only
caller. This function ends with a call to the other function and
returns void, so its a likely candidate for the return optimization
meaning it makes sense that its not visible in the backtrace. I
continued a bit after that, but wasn’t able to follow the trace very
long, since there was too many possibilities.

When you’ve finally decoded the backtrace, or at least parts of it you
need to figure out how this set of calls could have resulted in a
crash. For that, you’re on your own. But at least now you have a bit
more information that can help you.