== Running CCL under GDB ==
=== Overview ===
The lisp wants to handle most of the signals that can be raised to
indicate an exception. If the lisp's exception-handling code doesn't
know how to handle an exception, it enters the CCL kernel debugger
(there's no good and direct way to pass it to another debugger). When an
exception occurs in foreign code, the kernel debugger tries to
note that fact.
GDB's much more likely to be able to make at least some sense
out of the state of things in the exception-in-foreign-code case
than the lisp's kernel debugger is.
=== Loading GDB init file ===
Before doing anything with lisp in GDB, you need to load (or "source",
as a verb) the file `ccl/lisp-kernel/linuxx8664/.gdbinit` (replace
`linuxx8664` with whatever OS you're running). This file tells GDB
about signals that need to be passed to lisp for handling, and defines
some macros (most of which have to do with printing lisp object values).
That file will be sourced automatically if it (or a link to it)
is in the same directory as the executable (or, IIRC, in your
home directory.) Otherwise, once in GDB, just do:
{{{
shell> gdb
(gdb) file /path/to/ccl/lx86cl64
(gdb) source /path/to/ccl/lisp-kernel/linuxx8664/.gdbinit
}}}
=== Connecting GDB ===
When lisp is in the kernel debugger following an exception in foreign code:
(*) Note the PID, printed in brackets in the kernel debugger prompt, say it's `[1234]`
(*) Do the `R` command to display raw (hex) register values and note the value in `RIP` (the program counter/instruction pointer), say it's `0x12345678`.
(*) If GDB is already running, drop into it (via !^C). Otherwise, get a shell and do:
{{{
shell> gdb /path/to/ccl/lx86cl64 # location of lisp kernel
(gdb) source lisp-kernel/linuxx8664/.gdbinit
(gdb) attach 1234 # or whatever the PID is
}}}
(*) set a breakpoint at the exception:
{{{
(gdb) br *0x12345678 # or whatever the RIP value is
}}}
The leading asterisk is necessary to prevent GDB from interpreting
the integer as a line number.
(*) tell GDB to let lisp run:
{{{
(gdb) continue
}}}
The kernel debugger will likely still be waiting for input.
All other lisp threads should be suspended.
(*) Back in the kernel debugger, use the `x` command, which exits from the kernel debugger and resumes other threads.
{{{
[1234] Clozure CL kernel debugger: x
}}}
That should immediately break into gdb at the instruction that caused the
fault.
More generally, the next time any thread reaches the address of the breakpoint,
GDB will be entered. It's hard to guarantee that the first thread
that reaches that point will be the one that got the exception,
but it's usually very likely (other threads usually require some
time to wake up after being suspended.)
=== Debugging in GDB ===
{{{
(gdb) bt
}}}
will do a C backtrace (at least as far back as the foreign function
call from lisp)
{{{
(gdb) info regs
}}}
will show register values.
{{{
(gdb) x/i $pc
}}}
disassembles the instruction at the pc/%rip.
If the foreign code has symbolic debugging information and wasn't
heavily optimized, you can do a lot more (show argument and local
variable values, see argument names and values in backtrace, etc.)
at that point. If the problem is in some library code (either in
its behavior or in the parameters that lisp is passing it) and
it's possible to build the library with debugging enabled and
optimization toned down, you'll probably find the problem much
more quickly than you would otherwise.
Some Linux distributions provide debugging information and library source
for the standard libraries; on Fedora, this information is contained in
optional "debuginfo" packages. If it's available, the information is
often very useful.
As far as other tips and tricks ... I'm not sure what I could
say that'd be meaningful without a long explanation of how the
lisp is implemented.
[http://ccl.clozure.com/manual/chapter16.html#Implementation-Details-of-CCL The manual]
actually does explain quite a bit of that. If you want to use GDB
to step through/set breakpoints in compiled lisp code it's certainly
possible to do that (I do it all the time ...), but explaining the
issues and details might take a while. (From GDB's point of view,
this is like debugging machine code or debugging C code that you
don't have the source to and don't have symbolic information for;
it's OK at that and there isn't anything better at it widely available
under Linux, but that's not really its primary area of focus.)
Here are some hints for linuxx8664:
To find the address corresponding to a lisp symbol, first tell GDB to call the "find_symbol" function, which walks memory until it finds a symbol with a matching pname and returns the symbol tagged as a vector:
{{{
(gdb) call find_symbol("FIND-IF-NOT")
$1 = 52777632305533
}}}
You can then look at the slots of the symbol, which are a header followed by the pname, value and function. You have to subtract the tag from the address returned by find_symbol, which is 13 on x8664:
{{{
(gdb) x/gx 52777632305533-13 ; subtract fulltag_misc = 13
0x300040069170: 0x0000000000000715 ; header
(gdb) 0x300040069178: 0x00003000000a995d ; pname
(gdb) 0x300040069180: 0x0000000000000012 ; value
(gdb) 0x300040069188: 0x000030004006970f ; function
}}}
You can set a breakpoint on entry to the function:
{{{
(gdb) br *0x000030004006970f
}}}
Note that you don't need to subtract any tags - the code starts right at the address of the function.
To enter GDB when lisp is starting up, set a breakpoint at *_SPfuncall, which is called soon after the image is loaded (and is rarely called thereafter, since funcall is inlined).
To cause GC (including the EGC) to run integrity checks on entry, add `-DGC_INTEGRITY_CHECKING` to the CDEFINES in the kernel Makefile and rebuild the kernel. Alternately you can `(setq ccl::*gc-event-status-bits* 4)` at any time for the same effect.
If you look at the .gdbinit file, there are a number of useful lisp-related commands defined there. Try them...
=== Signal handling ===
The "handle" forms in the .gdbinit file enumerate
all of the signals that the lisp handles. The general
idea is to say something like:
{{{
handle SIGQUIT pass nostop noprint
}}}
which tells GDB that if the target process gets a SIGQUIT, it should
let the application handle it (GDB should "pass" it to the application)
without stopping or printing anything.
A SIGINT by default causes entry to GDB and is not passed to the
application. I sometimes find it useful to be able be able interrupt
the lisp via SIGINT (after entering GDB). Doing something like
{{{
handle SIGINT pass stop print
}}}
causes GDB to ask for confirmation because "SIGINT is used by the
debugger". (It's not used in the same way that breakpoints and
single-step exceptions are used, so I usually just sigh and give
it the confirmation it craves.)