Conversation

This PR takes another step towards cleaning up the broker shutdown path.

rc3 unloads modules that were loaded in rc1. A reactor timer is set when rc3 is entered so that if it takes too long, the reactor is terminated and we can clean up and exit. Due to #1006, we actually always let this timer run out to completion.

In this post-reactor cleanup path is some code that sends a "shutdown" request to any modules still loaded, and then calls modhash_destroy() which tries to destroy each module starting with a pthread_join(). This is prone to hanging because for whatever reason, modules may not have processed the shutdown message (for example, they may be blocked in an RPC).

A SIGARLM timer was set up to break out of this hang and move things along, however it's a kludge, another source of timing related behavior variability, and it caused some troubles noted in #1014 with atexit handlers being called from signal handler context.

This PR removes the cause of the hangs in pthread_join() by replacing the shutdown message in the post-reactor cleanup path with a pthread_cancel(). The SIGALRM handler is then dropped.

Since cancelling a thread causes it to terminate early, potentially missing some cleanup, we should try to avoid this case by making rc3 successful, if possible. Thus when it happens, we log to stderr (though still exit 0). The default shutdown grace period was increased somewhat (0.5s to 1.0s for <16 ranks).

This comment has been minimized.

I noticed some typos in commit messages, and some more "imbalanced" sharness tests (where modules that were loaded in the test script were not unloaded), so an update is coming. I'll hold off until #1016 is merged and rebase it first though.

Problem: rc1 files for different test environments
should have corresponding rc3 files for proper teardown.
Edit flux-sharness.sh to require a personality to have
both rc1 and rc3 scripts, and add rc3 scripts for the three
test personalities already configured with rc1 scripts.

Problem: broker teardown sometimes gets stuck in
pthread_join() on modules that don't complete
unloading in rc3.
Modules that are still loaded when the shutdown grace
timer expires are unloaded in the post-reactor teardown
code. First a shutdown message is sent to each module,
then the modhash is destroyed. During modhash destruction,
the hash element destructor is called on each remaining
module. The destructor calls pthread_join() on the module
thread before attempting to free its resources. This may
hang, for example if the module thread is blocked on an
RPC with the broker reactor stopped, and the module hasn't
yet processed the shutdown request.
Change the modhash destructor so that, as a prelude to
destroying the hash, it iterates over the remaining modules
in the hash, calling pthread_cancel() on them. This
makes it much more likely that the pthread_join() will
be successful.
Since this means some teardown within the module thread
may have been skipped, log each module that is terminated
with a cancellation so we can run those down.
Drop dead code for stopping/starting all modules asynchronously.

Problem: calling exit() from a signal handler can lead to
segfaults in the libczmq atexit handler, and with changes to
the module unloading code, should no longer be necessary.
Remove the SIGALRM handler and alarm() call that was
active during post-reactor teardown.

Problem: default shutdown grace period of 0.5s is
insufficient time to unload the default set of modules
in small (size < 16) "selfpmi" instances. Warnings
are issued during instance shutdown.
Double all the default grace period times, increasing
the <16 node time to 1s, which seems to be sufficient.
This unfortunately does slow down testing, but the
wasted time letting the shutdown grace period expire
every time should be addressed when #1006 is fixed.

Problem: some broker options don't have test coverage.
Add a few tests of basic options to t0001-basic.t.
While these tests are not comprehensive, at least
the exercise the code paths and ensure no segfaults
are lurking there.

This comment has been minimized.

Hmm, that's a good idea. I wonder if we could add something to sharness test_under_flux to make sure the same modules are loaded before and after the script executes, and fail the test if they are different?

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.