[Sbcl-devel] Threads Stable on Darwin? Maybe!

As of just a minute-or-so ago I pushed a commit that switches the
runtime to use semaphores instead of condition variables for
wait_for_thread_state_change.
With that change in place I have so far been unable to make threaded
Darwin croak -- and I used to be able to do that pretty easily.
...so: go forth and give it a shake, and let me know how it goes.
Cheers,
-- Nikodemus

Thread view

As of just a minute-or-so ago I pushed a commit that switches the
runtime to use semaphores instead of condition variables for
wait_for_thread_state_change.
With that change in place I have so far been unable to make threaded
Darwin croak -- and I used to be able to do that pretty easily.
...so: go forth and give it a shake, and let me know how it goes.
Cheers,
-- Nikodemus

On 5 December 2011 18:44, Nikodemus Siivola <nikodemus@...> wrote:
> With that change in place I have so far been unable to make threaded
> Darwin croak -- and I used to be able to do that pretty easily.
I just saw the first failure -- so not a total victory yet. This
during SB-CONCURRENCY tests:
fatal error encountered in SBCL pid 18092(tid 8840704):
fault in heap page 8549 not marked as write-protected
boxed_region.first_page: 0, boxed_region.last_page -1
Now, unfortunately the tests were running under --disable debugger, so
I didn't actually get much meaningful information.
If you see this, please grab the backtrace both from LDB for the
current thread, and from GDB (see manual for details) and post it
here.
Cheers,
-- nikodemus

On 7 December 2011 14:19, Nikodemus Siivola <nikodemus@...> wrote:
> So I'm coming to conclusion that this is Darwin being bogus. The
> CMPXCHG page in the Intel manuals lists a fairly impressive array of
> exceptions for it, so my assumption is that Darwin is sometimes giving
> us an EXC_BAD_ACCESS for those in circumstances other OS's don't.
Looking at things in more detail, it's not even CMPXCHG. It seems to
be always failing right...
(when (eq head (sb-ext:compare-and-swap (queue-head queue) head
first-node-prev))
;; ***HERE***, after loading the NIL into register, on trying to
write it into
;; memory.
(setf (node-next first-node-prev) nil
(node-prev head) nil
(node-next head) nil
(node-value head) nil)
...
I just pushed a change that provides the more verbose error message
seen below, and also makes it possible to tell the system to plow on
instead of lose()ing:
(define-alien-variable continue-after-memoryfault-on-unprotected_pages int)
(setf continue-after-memoryfault-on-unprotected_pages 1)
for who dare.
Cheers,
-- nikodemus