sbcl-devel

I can build sbcl from the latest source, but at the end I am told
WARNING! Some of the contrib modules did not build successfully or pass
their self-tests. Failed contribs:"
asdf-install
sb-bsd-sockets
And indeed, the asdf-install build ends thusly:
; SYS:CONTRIB;SB-BSD-SOCKETS;SPLIT.FASL.NEWEST written
; compilation finished in 0:00:00.095
(SYS:CONTRIB;SB-BSD-SOCKETS;CONSTANTS.LISP.NEWEST
SYS:CONTRIB;SB-BSD-SOCKETS;CONSTANTS.FASL.NEWEST
/local/src/lisp/sbcl/sbcl-git/contrib/sb-bsd-sockets/constants.fasl
/local/src/lisp/sbcl/sbcl-git/contrib/sb-bsd-sockets/foo.c
/local/src/lisp/sbcl/sbcl-git/contrib/sb-bsd-sockets/a.out
/local/src/lisp/sbcl/sbcl-git/contrib/sb-bsd-sockets/constants.lisp-temp)
fatal error encountered in SBCL pid 77761:
deferrable signal 1 blocked
whereas the sb-bsd-sockets builds ends on a seemingly unrelated problem:
unhandled SB-INT:SIMPLE-FILE-ERROR:
failed to find the TRUENAME of /local/src/lisp/sbcl/sbcl-git/contrib/sb-posix/constants.lisp-temp:
No such file or directory
and finally, testing ends here:
// Running /local/src/lisp/sbcl/sbcl-git/tests/interface.pure.lisp
#0A0 is an array of dimension ().
#(1 2 3) is a vector with 3 elements.
#2A((1 2) (3 4)) is an array of dimension (2 2).
::: Running :WITH-TIMEOUT-FORMS
fatal error encountered in SBCL pid 84201:
deferrable signal 1 blocked
(and no more tests are run).
And no, I checked, and I see no indication that I am doing anything to
block signal 1.
- Harald

FYI: I posted a response including backtraces and various debug
output, but my response is held up for moderation due to its size.
I don't know who the list moderator is, or how often he moderates.
If it takes too long, or if it is preferred, I can put the attachments
on the web and repost without them.
- Harald

On Martes 17 Marzo 2009, Harald Hanche-Olsen wrote:
> Oh heck, let me bypass the moderator business.
> Here is my email again, sans attachments:
>
> + Gábor Melis <mega@...>:
> > Remove the --disable-debugger from tests/subr.sh for now.
>
> Thanks; that did the trick.
>
> So let me take it from the top: This is on macosx 10.5.6 running on
> ppc (12 inch powerbook g4 if you must know).
>
> git bisect tells me this:
>
> aa0ed5a420ea5295d586b3f323b5375d3b506860 is first bad commit
> commit aa0ed5a420ea5295d586b3f323b5375d3b506860
> Author: Gabor Melis <mega@...>
> Date: Mon Feb 16 22:01:15 2009 +0000
>
> 1.0.25.37: block deferrables when gc pending in PA
>
> After determining this, I ran "git bisect reset", so I am once again
> at 1.0.26.1. Following instructions as best I could, I ran the test
> and got a backtrace. I have made stdout and stderr output available
> as separate files; I hope that doesn't complicate things. See here:
>
> http://www.math.ntnu.no/~hanche/tmp/sbcl-test-stdout.txt
> http://www.math.ntnu.no/~hanche/tmp/sbcl-test-stderr.txt
>
> Basically, where you find this text in stderr
>
> fatal error encountered in SBCL pid 8564:
> deferrable signal 1 blocked
>
> is where I get into ldb as seen in stdout.
>
> To my great surprise, after ldb had printed its backtrace, lisp
> continued executing, so I couldn't attach with gdb. If you wish me
> to, I can redo the process and attach with gdb while the ldb> prompt
> is up. However, I should mention that I haven't enabled threading, so
> maybe that point is moot?
No, it's not moot. The C backtrace can be informative ('thread apply all
ba' which is in your case equivalent to 'ba'), what's not needed is the
call backtrace_from_fp thing if you are running single threaded and
have the backtrace from ldb.
> - Harald

+ Gábor Melis <mega@...>:
> No, it's not moot. The C backtrace can be informative ('thread apply all
> ba' which is in your case equivalent to 'ba'), what's not needed is the
> call backtrace_from_fp thing if you are running single threaded and
> have the backtrace from ldb.
Okay, then, here is the output:
#0 0x95f7ce0c in read$NOCANCEL$UNIX2003 ()
#1 0x95fc6144 in _sread ()
#2 0x95fc60b0 in __srefill ()
#3 0x95fc5e2c in fgets ()
#4 0x00009fd0 in ldb_monitor () at monitor.c:466
#5 0x00007328 in lose (fmt=0x18efc "deferrable signal %d blocked\n") at interr.c:71
#6 0x000074c8 in check_deferrables_unblocked_in_sigset_or_lose (sigset=0x7) at interrupt.c:221
#7 0x00007d98 in check_interrupt_context_or_lose (context=<value temporarily unavailable, due to optimizations>) at interrupt.c:412
#8 0x00009048 in maybe_defer_handler (handler=0x9560, data=0x37000, signal=14, info=0xbfff7448, context=0xbfff7488) at interrupt.c:983
#9 0x000098d4 in maybe_now_maybe_later (signal=14, info=0xbfff7448, void_context=0xbfff7488) at interrupt.c:1058
#10 <signal handler called>
warning: unrecognized length (0x1000720) for sigtramp context
#11 <signal handler called>
warning: unrecognized length (0x1000720) for sigtramp context
Previous frame identical to this frame (gdb could not unwind past this frame)
Note that this is not from the same run as the output previously
posted, but I did the exact same thing. sbcl is hanging in the ldb>
prompt at this point.
- Harald

Oh heck, let me bypass the moderator business.
Here is my email again, sans attachments:
+ Gábor Melis <mega@...>:
> Remove the --disable-debugger from tests/subr.sh for now.
Thanks; that did the trick.
So let me take it from the top: This is on macosx 10.5.6 running on
ppc (12 inch powerbook g4 if you must know).
git bisect tells me this:
aa0ed5a420ea5295d586b3f323b5375d3b506860 is first bad commit
commit aa0ed5a420ea5295d586b3f323b5375d3b506860
Author: Gabor Melis <mega@...>
Date: Mon Feb 16 22:01:15 2009 +0000
1.0.25.37: block deferrables when gc pending in PA
After determining this, I ran "git bisect reset", so I am once again
at 1.0.26.1. Following instructions as best I could, I ran the test
and got a backtrace. I have made stdout and stderr output available as
separate files; I hope that doesn't complicate things. See here:
http://www.math.ntnu.no/~hanche/tmp/sbcl-test-stdout.txthttp://www.math.ntnu.no/~hanche/tmp/sbcl-test-stderr.txt
Basically, where you find this text in stderr
fatal error encountered in SBCL pid 8564:
deferrable signal 1 blocked
is where I get into ldb as seen in stdout.
To my great surprise, after ldb had printed its backtrace, lisp
continued executing, so I couldn't attach with gdb. If you wish me to,
I can redo the process and attach with gdb while the ldb> prompt is
up. However, I should mention that I haven't enabled threading, so
maybe that point is moot?
- Harald

+ Gábor Melis <mega@...>:
> I wrote a section that may go into BUGS later on what's needed to
> diagnose these bugs. I guess, your test case is just:
>
> (handler-bind ((sb-ext:timeout #'continue))
> (sb-ext:with-timeout 3
> (sleep 2)
> (sleep 2)))
Indeed, it seems so.
I built sbcl with ldb support plus QSHOW, QSHOW_SAFE and
QSHOW_SIGNAL. Now when I run it, it spews messages like these, but I
suppose that is to be expected?
Memory fault at: 0x100fcc9c, PC: 0x1006c780
heap WP violation? fault_addr=100fcc9c, page_index=252
Memory fault at: 0x10027344, PC: 0x1006c7c0
heap WP violation? fault_addr=10027344, page_index=39
Anyway, it seems to work more-or-less as normal otherwise (when you
can see the normal output through the thickets of debug spew).
If I cut down interface.pure.lisp to contain only the code above, then
run "sh run-tests.sh --break-on-failure foo.pure.lisp", the run ends with
Memory fault at: 0x103d6a74, PC: 0x10cfe1fc
heap WP violation? fault_addr=103d6a74, page_index=982
fatal error encountered in SBCL pid 76297:
deferrable signal 1 blocked
test failed, expected 104 return code, got 1
and sbcl quits, so there is nothing there to debug. OTOH, if I just
run sbcl as
./src/runtime/sbcl --core output/sbcl.core --no-sysinit --no-userinit
and paste the above code into it, the code seems to work just fine.
Again, there seems to be nothing to debug. It's a Heisenbug?
I don't know what to try next.
- Harald

+ Harald Hanche-Olsen <hanche@...>:
> Anyway, it seems to work more-or-less as normal otherwise (when you
> can see the normal output through the thickets of debug spew).
Duh. That minor problem is trivially solved by redirecting stderr into
a different tty, at which point what happens in the regular tty looks
perfectly normal. I still don't know how to get at something
backtraceable.
- Harald

+ Harald Hanche-Olsen <hanche@...>:
> I don't know what to try next.
I am currently going with plan B: Playing with git bisect.
It will take quite a while because it't been quite a while since I
compiled sbcl last.
- Harald

On Jueves 19 Marzo 2009, Harald Hanche-Olsen wrote:
> + Gábor Melis <mega@...>:
> > Could you try this patch, rebuild and run the tests, please?
>
> Done. During the build, I get the same error as before when building
> asdf-install.
Thank you and damn. Can you make the log of sh make-target-contrib.sh
(when compiled with the QSHOW stuff) available as well?

On Viernes 20 Marzo 2009, Harald Hanche-Olsen wrote:
> + Gábor Melis <mega@...>:
> > I'm a ppc-darwin binary short of reproducing this. I have access to
> > an OS X 10.4 box, but the downloadable binary buserrors, which is
> > fixed in later version for which there is no binary ...
>
> Ouch. I don't suppose there is any way I could compile a binary on
> 10.5 that would run on 10.4, is there? Could you perhaps compile one
> using Clozure CL? (http://trac.clozure.com/openmcl#GettingClozureCL)
>
> - Harald
I think anything later than this:
1.0.23.47: binaries built on now Leopard run on Tiger as well
will run fine on 10.4.

+ Gábor Melis <mega@...>:
> I'm a ppc-darwin binary short of reproducing this. I have access to
> an OS X 10.4 box, but the downloadable binary buserrors, which is
> fixed in later version for which there is no binary ...
Ouch. I don't suppose there is any way I could compile a binary on
10.5 that would run on 10.4, is there? Could you perhaps compile one
using Clozure CL? (http://trac.clozure.com/openmcl#GettingClozureCL)
- Harald

Harald Hanche-Olsen <hanche@...> writes:
> Ouch. I don't suppose there is any way I could compile a binary on
> 10.5 that would run on 10.4, is there? Could you perhaps compile one
> using Clozure CL? (http://trac.clozure.com/openmcl#GettingClozureCL)
I think there are still some build bugs for us from CCL, which
complains if there are typecases with shadowed clauses. I had a patch
somewhere, but Brian Mastenbrook said there was more to do; I forget
the details.
On the other hand, I think recent sbcls, even built on OS X 10.5, will
run on 10.4. If nothing has been produced by the time I get in to
work, I'll try to build whatever's necessary.
Best,
Christophe

+ Gábor Melis <mega@...>:
> I think anything later than this:
> 1.0.23.47: binaries built on now Leopard run on Tiger as well
> will run fine on 10.4.
Okay. In an attack of too much multitasking, I managed to blow away my
working sbcl. I downloaded the 1.0.22 binary and am now using it to
build 1.0.25.36. I'll make it available for download afterwards.
- Harald

On Viernes 20 Marzo 2009, Harald Hanche-Olsen wrote:
> + Harald Hanche-Olsen <hanche@...>:
> > I downloaded the 1.0.22 binary and am now using it to
> > build 1.0.25.36. I'll make it available for download afterwards.
>
>
> http://www.math.ntnu.no/~hanche/tmp/sbcl-1.0.25.36-powerpc-darwin.tar
>.bz2
>
> - Harald
I managed to reproduce the bug and committed the fix as 1.0.26.16.
Thanks for the report and debugging,
Gabor