Thoughts on Continuous Deployment and Failure Management

I have been using Continuous Deployment for over 2 years now. It has completely changed how I write code and manage failures. The systems I design and build are now a lot more stable, thanks to Continuous Deployments.

There are two parts to continuous deployment, the actual updates themselves, and design of the code to manage the temporary outages during the updates.Continuous Deployment is just an interruption to the code, a kind of failure. However designing the code to handle these failures forces you to think about all the other failures too, as they could potentially occur during an update.

Some people think you can’t use Continuous Deployment because the updates will break the user experience. Therefore you shouldn’t use it for sites that need to remain continuously available i.e. shopping sites. However I believe continuous deployment forces you to design the code in a way that makes it more overall stable and available.

Continuous Deployment forces you design code that can cope with a service being temporarily unavailable. There are a number of design patterns and architectural techniques that allow the code to cope with these unavailable services, and thus not be affected by Continuous Deployment updates.

Therefore, Continuous Deployment is not only about rapid updates to your sites of code but also about failure management. When you use Continuous Deployments it forces you to think about how you code will fail. Since during an update services will be temporarily unavailable so your code needs to be stable when and if these issues arise.

For example, what if the user clicks on checkout just when your web server is updating. Uh oh!! Well if you did the call via ajax, on failing to update the primary web server the code would then send the transaction to a second server (maybe for processing once everything’s back up) and return the user to the confirmation screen. Interestingly, if this is coded correctly, the user would not even be aware that the primary service was unavailable due to an update!

High quality coding should be able to survive a temporary interruption to service. High quality coding allows the user to have an uninterrupted service, no matter what updates are happening in the background. The ajax example is one of multiple techniques you could use to create code that can withstand interruption.

For code to be robust it need to be able to handle all types of failures, since at the end of the day a failure will occur. Setting the code up to handle continuous deployment means all these potential failures will have been thought about already.

The only difference is how well will you code to manage the failure.

Some common failures:

Server Failure

Network Failure

Lost Request

Overloaded Errors

Software Updates

etc

Continuous Deployment is just another failure but by using it forces you to think about all the other failures too, as they could potentially occur during an update. This not only makes your code more robust, but it also increases your uptime for clients. This is because most failures are now managed. so when the worst case failures occurs your code can already cope with almost all of them.

Netflix actually take this to the next level, they have a tool called Chaos Monkey which actively breaks parts of their system by taking down servers, turning off network cards etc just to make sure there code can handle failures. The clients are unaware and the service is seamless.

Make the jump. Switch to Continuous Deployment and watch your codes quality and robustness increase. It’s a win win.

Thoughts on Continuous Deployment and Failure Management was last modified: January 25th, 2017 by Daniel Pamich