just a note, i was able to reliably reproduce this with rados/multimon (no subsets), filtering to only do the 21.yaml, mon_recovery.yaml, and async messenger.. usually 2-3 failures per run (of 92 tests), with the symptom that the quorum_status command timed out.