Err, cryptic, isn't? In a lot of cases cr could have been a simple parameter of our current function, and we would have had to browse up to the caller thanks to the up command, and continue to use p to check where that comes from, it is apparently not the case here, cr is not a parameter. Let's see around our guilty line inside the picoos_deallocate function (l shortcut for list):

Re-outch, c's value seems reasonable (we can even dereference it), but the value of c->size is completely incorrect, it is even exactly 0x100000000...

A couple of tries on the side

Where does this odd value come from? Either some computation is bogus, or we have an overflow from somewhere else.

Luckily, there are few lines of code which give a value to size, so we can quickly add printfs alongside those. We however only get reasonable values, and never 0x100000000, the problem is thus elsewhere... What can we do?

valgrind is a buffer overflow specialist, and usually very efficient at detecting pointer and allocation errors. It is however ineffective here: a lot of "Conditional jump or move depends on uninitialised value(s)" left apart, it also finishes on this cr->size line without having given any clue before (that is actually because pico uses its own home-made allocator, which valgrind thus can not debug). Electric-Fence, a more leightweight buffer overflow detection tool, is ineffective here for the same reason.

break, watch

There is still the watch solution: indeed, when executing the program several times with address randomization disabled (echo 0 > /proc/sys/kernel/randomize_va_space), we can notice that the address of c->size is always the same:

(gdb) p &c->size
$1 = (long int *) 0x7ffff736f9d8

We can just ask gdb to stop when this memory value changes. Let's first restart from zero and stop at the beginning of the main function, before things go bad (b is a shortcut for breakpoint):

(gdb) b main
Breakpoint 1 at 0x4012ef: file bin/pico2wave.c, line 73.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /tmp/svox-1.0+git20100205/pico/.libs/lt-pico2wave -w test.wav foo
Breakpoint 1, main (argc=4, argv=0x7fffffffdc58) at bin/pico2wave.c:73
73 char * wavefile = NULL;

Getting gdb do a little computation, we can see that the size field overwrite was really close: just 4 bytes... This memset is encapsulated in functions to look nicer, so we would have to use up several times to browse up to the caller inside the mel_2_lin_lookup function (3rd in the call stack). We can go faster thanks to frame:

Here is the culprit, which overwrites into our field. We have seen above that it was overwriting by 4 bytes. Looking closer, the memset clears the XXr buffer from byte 4*m1 to byte 4*(PICODSP_FFTSIZE + 1). Couldn't +1 be superfluous?! Let's have a look at the code which allocates this pointer (actually it comes from a wcep_pI field which is actually called int_vec28):

+1 indeed seems not appropriate! Or the allocation is not big enough?! Here we should discuss with the author, but in any case here is a clear culprit! Thanks gdb!

Conclusion

You don't actually need to know a lot of gdb commands. The first step: r, bt, l, and p, is actually enough in most cases! For most other cases, a couple of b and watch can help a lot to debunk bugs. You can also try to combine those with reverse-continue.

Appendix

Threads

When debugging a multithreaded program, it is useful to switch between threads:

Heisenbugs

A Heisenbug is a bug which disappears as soon as one tries to debug it. We talked earlier about the optimisations issues. In the case of a multithreaded program, it often happens that merely starting it in gdb makes the bug disappear.

Either the program crashes, one can use the core which it dumps (use ulimit -c unlimited if the program didn't dump a core).

$ gdb ./monprog core
...

And one can then examine the whole processus the same way as if it was alive, one just can not resume execution.

Or the program hangs. One can force dumping a core by using control-\ (this is like control-C, except that it also creates a core). One can also attach to the living process:

$ gdb ./monprog $(pidof monprog)
...

Misc tips

Here are a few things I use quite often

To get the current context, one can use the list command, but one can also press control-x a to get gdb use ncurses to keep the listing at the top of the screen. One can press control-x 2 to cycle between different combinations of C source code, assembly code, and assembly registers. Note that up/down arrows will now move the source instead of looking in history. One can use control-p and control-n to browse the history.

It sometimes happens that some tool tells me that some instruction at some address did somebody bad, or something similar which gives me an instruction address (e.g. 0x1234). To know where that is, use

(gdb) l * 0x1234

or use the addr2line tool.

One can print several adjacent memory locations thanks to @, for instance:

(gdb) p t[8]@16

prints the 16 elements starting at t[8]

Reverse debugging is extremely powerful. In the case shown above, we could have used it to spot the bug: first we run the program with recording enabled

Here we asked gdb to stop on main to activate recording, and continue execution, and stop on _exit (the normal termination of the process) to simply... restart from zero. One also disables pagination so that it continues in loop: