pthread_cond_wait doesn't return correctly. - Linux

This is a discussion on pthread_cond_wait doesn't return correctly. - Linux ; Hi, all
Now I have a problem that one thread(thread1) is blocked for a long time (2hours) at sigsuspend in pthread_cond_wait, with p_signal=0 and p_cond_avail=1, which means another thread(thread2) has called pthread_cond_broadcast. I have checked the mutex status, which is ...

pthread_cond_wait doesn't return correctly.

Hi, all
Now I have a problem that one thread(thread1) is blocked for a long time (2hours) at sigsuspend in pthread_cond_wait, with p_signal=0 and p_cond_avail=1, which means another thread(thread2) has called pthread_cond_broadcast. I have checked the mutex status, which is unlocked yet. I am working with uClibc-0.9.29 on linux-2.6.16, MIPS. It seems the __pthread_sig_restart is lost. Is it a known bug or any reasonable explanation for it?
Here is the thread back trace and thread descriptor's information:
################################################## #####################################
[Switching to thread 6 (process 5040)]#0 0x2ba561b8 in sigsuspend () from rootfs/lib/libc.so.0
(gdb) bt
#0 0x2ba561b8 in sigsuspend () from rootfs/lib/libc.so.0
#1 0x2b6de37c in __pthread_wait_for_restart_signal () from rootfs/lib/libpthread.so.0
#2 0x2b6d3d70 in pthread_cond_wait () from rootfs/lib/libpthread.so.0
thread_addrpthread_descr) 0x7c1ffe20
p_pid:5040
p_tid:28701
p_nextlivepthread_descr) 0x7c3ffe20
p_nextwaitingpthread_descr) 0x0
p_nextlockpthread_descr) 0x7b9ffe20 /// Not waiting for any mutex or lock.
p_start_args:{start_routine = 0x2b2d8ec8 , arg = 0x10110280, mask = {...}, schedpolicy = -1, schedparam = {__sched_priority = 0}}
p_signal:0 /// weird, it didn't receive the restart signal
p_sigwaiting:0 '\0'
p_condvar_avail:1 '\001' /// Here it mean pthread_cond_signal/pthread_cond_broadcast has been issued.
################################################## ########################################

Last edited by jhyang; 12-08-2010 at 12:23 PM.

Re: pthread_cond_wait doesn't return correctly.

For now, I have discovered that the __pthread_sig_restart was lost with a SIGSEGV pending in that thread. And at the same time, an error was observed at linux kernel (signal.c: setup_frame) . What kind of problem could that be?