Patterns of resilience

In this slide deck, I first describe what resilience is, what it is about, why it is important and how it is different from traditional stability approaches.

After that introductory part the main part is a "small" pattern language which is organized around isolation, the typical starting point of resilient software design. I used quotation marks for "small" as even this subset of a complete resilience pattern language still consists of around 20 patterns.

All the patterns are briefly described and for some of the patterns I added a bit of detail, but as this is a slide deck, the voice track - as usual - is missing. Also this pattern language is still sort of work in progress, i.e., it has not yet settled and some details are still missing. Yet I think (or at least hope), that the slides might contain a few useful insights for you.

Very good! It seems that I used some of these patterns in BPM - see http://improving-bpm-systems.blogspot.ch/2014/08/bpm-for-digital-age-shifting.html and http://improving-bpm-systems.blogspot.ch/2014/08/bpm-for-software-architects-from.html

11.
reliability
degree to which a system, product or component
performs specified functions
under specified conditions for a specified period of time
ISO/IEC 25010:2011(en)
https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en
Underlying assumption

39.
Stateless
• Supports location transparency (amongst other patterns)
• Service relocation is hard with state
• Service failover is hard with state
• Very fundamental resilience and scalability pattern

41.
Relaxed Temporal Constraints
• Strict consistency requires tight coupling of the involved nodes
• Any single failure immediately compromises availability
• Use a more relaxed consistency model to reduce coupling
• The real world is not ACID, it is BASE (at best)!

69.
Monitor
• Observe unit behavior and interactions from the outside
• Automatically respond to detected failures
• Part of the system – complex failure handling strategies possible
• Outside the system – more robust against system level failures

71.
Error Handler
• Units often don’t have enough time or information to handle errors
• Separate business logic and error handling
• Business logic just focuses on getting the task done (quickly)
• Error handler has sufficient time and information to handle errors