Bug Description

% sbcl --load forkthread.lisp
(running SBCL from: /home/sky)
This is SBCL 1.0.31.26, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
; loading system definition from
; /home/sky/projects/lisp/sbcl.git/contrib/sb-grovel/sb-grovel.asd into
; #<PACKAGE "ASDF1">
; registering #<SYSTEM SB-GROVEL {B2AFB41}> as SB-GROVEL
fatal error encountered in SBCL pid 7933(tid 3084954384):
kill_safely: pthread_kill failed with 3

This error occurs because sb-thread::*all-threads* is not updated by sb-posix:fork.

The relevant part of POSIX states:

“A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread [...]”

Proposal: change the behavior of sb-posix:fork to take this into account, resetting the list of threads in *all-threads* to the one active thread.

"A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."

The "consequently" part basically means that you are extremely limited in what you can do after forking a threaded program. I'm tempted to say that having the wrong value for *all-threads* is the least of your problems.

The entire idea that anything can be done "correctly" to keep the system running is a bit laughable, but I put together a proof-of-concept for at least keeping gc_stop_the_world() from pitching a fit (tested on 1.0.11 x86-64):

(defun spork ()
"sb-posix:fork doesn't do even minimal fixup of the thread-tracking
that SBCL requires. SPORK does the most minimal fixup to make GC not
completely choke. If we really wanted to be clever, we could stop the
world using the GC primitives, fork, then re-create the threads
in-place over their old stacks and contexts and have start-the-world
pick up their old register states (saved by stop-the-world)."
;; This is, of course, a total hack.
(sb-sys:without-gcing
;; If we are the child, we can't allow -anything- to take the
;; all-threads lock. That means no interrupts and no gcing.
(let ((pid (sb-posix:fork))
(state-dead (sb-vm:fixnumize 3))) ;; Is a C runtime constant.
(when (zerop pid)
;; We're the child, all other threads are dead. Here's
;; where we do some serious (brain) damage:
(loop
;; For each thread the runtime knows about other than the
;; current thread, set the state to STATE_DEAD so that
;; world stopping doesn't choke on it. We don't bother
;; with the whole condition broadcast junk as we're the
;; only thread running, thus nobody is waiting on the
;; condition.
for thread = (extern-alien "all_threads" system-area-pointer)
then (sb-sys:sap-ref-sap thread (* sb-vm::thread-next-slot
sb-vm:n-word-bytes))
until (sb-sys:sap= thread (sb-sys:int-sap 0))
unless (sb-sys:sap= thread (sb-thread::current-thread-sap))
do (setf (sb-sys:sap-ref-word thread (* sb-vm::thread-state-slot sb-vm:n-word-bytes)) state-dead))
;; The presumption at this point is that merely setting the
;; thread state to STATE_DEAD is sufficient to keep the
;; system running. This is almost certainly wrong.
)
pid)))

This is still a bad idea. The only defined-correct operation for a forked thread is exec, possibly preceeded by a pthread_traceme. And if you're going to exec, deport your strings before forking. And put the call to fork and all of the child process code in a without-gcing. And... Well, I'm not convinced that even that is sufficient to prevent anything from going wrong.