io_service destructor hangs on Mac OS X

Description

Sometimes, when destroying an io_service object on Mac OS X, my application is locked indefinitely. I have the feeling this happens when I destroy the io_service object right at the time when something's wrong with my network connection.

I have attached Instruments, and this seems to point that it's stuck in the destructor of pipe_secet_interrupter(), in the call to close() (See screenshot attached).

Several users have still reported the same problem, after changing the code to ensure that nothing is done on the socket from different threads simultaniously. It seems to be a different issue than the one from the python tracker?

All I can suggest is that you change the select_interrupter's destructor (pipe or socket, whichever is easier to test) to call reset() before it closes the descriptors. Let me know if that makes any difference, thanks.

Maybe useful: it seems the program from the Python tracker also gets stuck in close() for the user that is reporting the problem. For the test program, it's not in an uninterruptable sleep though, whereas it is for our application.

I fixed a destruction order problem in our application where a timer was being destructed after the io_service object was destroyed. Making sure the io_service stays alive until the last timer goes away seems to have made the loop disappear.

I have a similar issue in libtorrent on Mac OS X 10.6.5, built as 64 bit. I'm not sure about what might have made this start to happen, but it appears to have started around the time when I merged uTP support into trunk, which essentially mean a lot more traffic (and events) over a single udp socket. It seems to somehow be related to busyness, as it seems to be more likely to hang when it's been running for a while (an hour or so). It hangs here (I'm on boost 1.44):

I can at least confirm that we didn't have reports of the problem anymore since we avoided calling close() after destroying the io_service object. If you're sure your session shared_ptr isn't accidentally kept alive longer than io_service (a shared_ptr pitfall we fell into), maybe your problem is slightly different.

Thank you for the test case. I was able to reproduce the issue on several different Mac OS X 10.6 systems. It seems to be an OS bug triggered by the use of EV_ONESHOT. Please try the following diff to see if it fixes the problem for you, and doesn't cause any other problems. (Note that you may need to apply the diff by hand since it is made against the trunk.)

(In [69467]) * Add support for the fork() system call. Programs that use fork must call

io_service.notify_fork() at the appropriate times. Two new examples have been
added showing how to use this feature. Refs #3238, #4162.

Clean up the handling of errors reported by the close() system call. In
particular, assume that most operating systems won't have close() fail with
EWOULDBLOCK, but if it does then set blocking mode and restart the call. If
any other error occurs we assume the descriptor is closed. Refs #3307.

EV_ONESHOT seems to cause problems on some versions of Mac OS X, with the
io_service destructor getting stuck inside the close() system call. Use
EV_CLEAR instead. Refs #5021.

Fixed a compile error on some versions of g++ due to anonymous enums.
Fixes #4883.

Fixed a bug in asio::streambuf where the consume() function did not
always update the internal buffer pointers correctly. The problem may
occur when the asio::streambuf is filled with data using the standard
C++ member functions such as sputn(). (Note: the problem does not
manifest when the streambuf is populated by the Asio free functions
read(), async_read(), read_until() or async_read_until().)

EV_ONESHOT seems to cause problems on some versions of Mac OS X, with
the io_service destructor getting stuck inside the close() system
call. Use EV_CLEAR instead. Fixes #5021.

Fixed a bug on kqueue-based platforms, where reactor read operations
that return false from their perform() function are not correctly
re-registered with kqueue.