Example code can be found on GitHub. All code on this post is licensed under MIT.

Mayhem Mandrill Recap

The goal for this 5-part series is to build a mock chaos monkey-like service called “Mayhem Mandrill”. This is an event-driven service that consumes from a pub/sub, and initiates a mock restart of a host. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.

Exception Handling

You may have noticed that, while we’re catching exceptions on the top level, we’re not paying any mind to exceptions that could be raised from within coroutines like restart_host, save, etc. To show you what I mean, let’s fake an error where we can’t restart a host:

We see that cattle-tx09.example.net could not be restarted. While the service doesn’t crash and the message was saved to the database, it will never get cleaned up and acked. The extend on the message deadline will also keep spinning. This is because the exception raised was never returned, so we never hit the event.set() line. We’ve essentially deadlocked ourselves on the message.

The simple thing to do is add return_exceptions=True to asyncio.gather, so rather than completely dropping an exception, it’s returned along with the successful results:

We don’t see any tracebacks anymore in the output and messages are now being cleaned up and ack'ed; however, it’s still not that helpful since we don’t have any insight into if restart_host raised or not:

Recap

Exceptions will not crash the system - unlike non-asyncio programs. and they might go unnoticed. So we need to account for that.

I personally like using asyncio.gather because the order of the returned results are deterministic, but it’s easy to get tripped up with it. By default, it will swallow exceptions but happily continue working on the other tasks that were given. If an exception is never returned, weird behavior can happen, like spinning around an event.