On Dec 13, 2007 12:35 PM, Nikodemus Siivola <nikodemus@...> wrote:
> On Dec 13, 2007 6:49 PM, Daniel Farina <drfarina@...> wrote:
> > Here is a backtrace when running with the new GET-MUTEX.
> >
> > Unfortunately, it seems everything as one might expect with a naive
> > recursive lock.
>
> This is just too bizarre. I can see three possibilities (aside from
> another thread frobbing the mutex back and forth):
>
> * SBCL has a signal handling bug in the runtime that allows the
> thread to be interrupted after it has grabbed mutex.
>
> But this doesn't explain how it is subsequently released.
>
> * Your kernel has a signal handling bug causing the above. Same
> problem -- doesn't explain what you see as far as I can tell.
>
> * You have bad memory / processor cache / other magic hardware
> bit that causes the thread to see its own write with a small
> latency. Quite unlikely, really, but this one actually explains
> what is observed...
>
I think you are not dense, and it is just as bizarre as you think it is.
However, pay in mind that it happens systematically on two machines of
completely different make but similar architecture.
I'm going to try and wrap up the writing of a test case that I can
distribute and doesn't have quite so many dependencies so that more
data can be gathered.
Another alternative is just to throw NOPs after load/stores and see
what happens.
fdr

Hi,
There are a few unexpected failures, unhandled errors, and invalid exit
statuses on Linux/ppc and Darwin/ppc; additionally, the tests do not run
to completion on the Darwin/ppc host I test on (the topmost SBCL dies in
GC), so I only know about the test files that fail on both Linux/ppc and
Darwin/ppc.
Some of the failures I think might as well be marked as expected, others
I don't know enough about to say. Can anybody familiar with these
issues say something about the following?
Failure: debug.impure.lisp / (UNDEFINED-FUNCTION BUG-353)
This is marked :fails-on on x86, x86-64, alpha, mips. It fails on
Linux/ppc and Darwin/ppc, too. I think this should be expected on ppc.
(Does it not fail on sparc?)
Unhandled error dynamic-extent.impure.lisp
There are two tests in this file that fail on both Darwin/ppc and
Linux/ppc:
(assert-no-consing (dxclosure 42))
(assert-no-consing (nested-dx-conses))
This test fails only on Darwin/ppc:
(assert-no-consing (cons-on-stack 42))
I'm not at all familiar with dx issues. Are these to be expected?
Failure: hash.impure.lisp / (HASH-TABLE WEAKNESS REMOVAL)
This fails on Linux/ppc, Darwin/ppc. I think this is just a buggy test:
there's a comment saying that the test may not be bulletproof on gencgc,
and ppc uses gencgc now. Wouldn't it better to call this an expected
error on builds with gencgc than to have it warn and report success?
Invalid exit status: room.test.sh
This fails on Linux/ppc, evidently by running out of heap space. I've
run this a half dozen times, and it fails in exactly the same way each
time, but I don't understand what's going on. A trace file is here:
http://www.progn.net/static/tmp/room.out
Thanks,
RmK

On Dec 13, 2007 6:49 PM, Daniel Farina <drfarina@...> wrote:
> Here is a backtrace when running with the new GET-MUTEX.
>
> Unfortunately, it seems everything as one might expect with a naive
> recursive lock.
This is just too bizarre. I can see three possibilities (aside from
another thread frobbing the mutex back and forth):
* SBCL has a signal handling bug in the runtime that allows the
thread to be interrupted after it has grabbed mutex.
But this doesn't explain how it is subsequently released.
* Your kernel has a signal handling bug causing the above. Same
problem -- doesn't explain what you see as far as I can tell.
* You have bad memory / processor cache / other magic hardware
bit that causes the thread to see its own write with a small
latency. Quite unlikely, really, but this one actually explains
what is observed...
Or maybe I'm just dense.
Cheers,
-- Nikodemus

Ken Olum <kdo@...> writes:
> Does the statistical profiler actually work? I always get this
> warning and nothing gets profiled.
>
> This is SBCL 1.0.12, an implementation of ANSI Common Lisp....
> * (require :sb-sprof)
>
> ("SB-SPROF")
> * (sb-sprof:with-profiling () (loop repeat 100000 collect t))
> WARNING: No sampling progress; possibly a profiler bug.
>
> This is in Mandrake 2007.0, kernel 2.6.17, on x86_64
Yes, it does actually work. Please see the SBCL manual for examples of
usage.
(The problem in your case is very probably that a single iteration of
the the with-profiling body takes less time to execute than the
profiling interval).
--
Juho Snellman

On Dec 13, 2007 12:29 PM, Attila Lendvai <attila.lendvai@...> wrote:
> the regression manifests in a code that is very unfriendly to
> debugging, so i can't easily provide a testcase (it's a yaclml
> template, a big lisp form is built and compiled at runtime based on an
> xml html template).
>
> i've tried to unpull stuff starting from 1.0.12.31, but i think the
> problem is closer to 1.0.12.20 (which works fine).
>
> unfortunately it's a pretty useless bug report, but i hope it helps a little,
Any more details, like what does "the regression" mean?
Cheers,
-- Nikodemus

hi,
the regression manifests in a code that is very unfriendly to
debugging, so i can't easily provide a testcase (it's a yaclml
template, a big lisp form is built and compiled at runtime based on an
xml html template).
i've tried to unpull stuff starting from 1.0.12.31, but i think the
problem is closer to 1.0.12.20 (which works fine).
unfortunately it's a pretty useless bug report, but i hope it helps a little,
--
attila

Daniel Barlow writes:
> Christophe Rhodes wrote:
>
> > I do have a question about whether SBCL should try to be a little
> > bit more clever than it currently is about
> > *DEFAULT-PATHNAME-DEFAULTS*: whether it should chdir() to the
> > directory part of that before doing any of the notional "filesystem
> > access" that's involved in executing a named executable, for
> > instance.
>
> Consider the case where *DEFAULT-PATHNAME-DEFAULTS* is a logical
> pathname that maps to different physical directories depending on file
> type (as might be the case if e.g. you are using LPNs to build fasls for
> multiple implementations from the same source)
>
> No, I don't have the right answer either, I just thought I'd throw it in.
Strictly speaking, LPNs "do not refer directly to filenames", we don't
translate them under NATIVE-NAMESTRING, and they're unusable with
SB-POSIX, so ISTM that we would not be overly inconsistent with
ourselves to either error or not chdir in case *D-P-D* is an LPN.
--
Richard

Jean-Philippe Barrette-LaPierre <jpb@...> writes:
> I'm using sbcl-1.0.12.25-x86-darwin took from http://sbcl.static.net/
> builds/.
> When I evaluate the following code:
>
> (declaim (optimize (speed 0) (space 0) (compilation-speed 0) (debug 3)))
>
> (defun test ()
> (break))
>
> (test)
>
> The "source" command within the debugger returns this:
> "Cannot find source localtion for #<COMPILED-CODE-LOCATION (SB-
> C::VARARGS-ENTRY BREAK)>"
>
> Is it a known problem? What am I doing wrong?
You're trying to show the source of BREAK. Assuming your goal was to
show the location of the *call* to BREAK, you need to navigate to the
debugger frame for TEST (with up/down) before using the command.
And frankly, I would suggest using the Slime debugger rather than the
built-in one.
--
Juho Snellman