Created attachment 346742[details]
fix do_coredump() vs ptrace_start() deadlock
This is not as simple as I thought...
I suspect the patch from openvz is not exactly right.
PF_SIGNALED is always set when the thread is killed by the fatal
signal, if we check this flag in ptrace_start() I'm afraid we can
have a false positive when the exiting tracee calls
tracehook_report_exit().
Also. There is no guarantee the tracee must have PF_SIGNALED when
we are going to deadlock. Suppose the tracee just exits and sleeps
in TASK_TRACED because of PTRACE_EVENT_EXIT. The tracer calls
ptrace_start(). After that another thread which shares the same
->mm starts the coredump. zap_process() wakes up the tracee, it
calls exit_mm()->wait_for_completion() and sleeps in D state but
without PF_SIGNALED.
We could add the SIGNAL_GROUP_EXIT check:
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -933,7 +933,8 @@ ptrace_start(long pid, long request,
*/
wait_task_inactive(child);
while (child->state != TASK_TRACED && child->state != TASK_STOPPED) {
- if (child->exit_state) {
+ if (child->exit_state ||
+ (current->signal->flags & SIGNAL_GROUP_EXIT)) {
__ptrace_state_free(state);
goto out_tsk;
}
If we race with the coredumping thread which shares the same ->mm,
the tracer should be killed by SIGKILL too. In that case we can just
return, the error code does not matter because we will never return
to the user-space.
(This patch could also help if the rt tracer preempts the tracee, we
can spin forever in this case. At least, with this check we can kill
the tracer. However, without fixing wait_task_inactive() this doesn't
really help).
The patch above should fix this deadlock, but unfortunately it does
not solve all problems. Note that with this test-case the tracer,
tracee, and the coredumping thread share the same ->mm. This is
because it was written originally to exploit another problem fixed
by 5ecfbae093f0c37311e89b29bfc0c9d586eace87.
But what if the tracer does not participate in coredumping? In that
case the tracer is not killed, and we still have problems. The tracer
will spin until the coredump completes. So I'd suggest this patch.
Untested, not even compiled. Just for review/discussion.
Also. Upstream checks mm->core_state in may_ptrace_stop() to
prevent another deadlock (and it is still needed afaics, despite
the fact schedule() now checks signal_pending_state()). I wonder
if RHEL needs something like this check. utrace_quiescent() checks
sigkill_pending(), but if SIGKILL was already dequeued it can
return false. This means that if the tracer, tracee, and the
coredumping thread share the same ->mm we can deadlock. Fortunately,
sigkill_pending() == F also means we can send the private SIGKILL
to the tracee and wake it up, but I guess this won't be obvious
to admin.