article

The Problem Isn't Change. It's Uncontrolled Change.

You deployed an app. Nothing has changed in three days, but it suddenly crashes. Why? Memory leak.

You deployed an app. Nothing has changed in three weeks, but it suddenly stops working. Why? A database query came back empty and the web application freaked out trying manipulate a null value, deciding instead to just stop in its track and return nothing.

You deployed a load balancing service. Nothing has changed in three months, but it suddenly stopped load balancing your app. Why? One of the ports on an intermediate switch decided to fry. Literally. It's a black hole and the load balancer can't find your apps anymore.

There are probably a hundred (and then some) other examples we could cite to prove that changing nothing does not guarantee that things will continue working as expected. Hardware - which ultimately, whether we like it or not, is required to provide all the raw resources we need to run and deliver applications, sometimes fails. Bugs in applications sometimes appear only after continued use, or heavy load, or when the user finally enters that one combination that QA forgot to try.

Lack of change is not an indicator of continued success.

Yet that is the modus operandi most enterprise organizations continue to operate under - the less change, the better. Changes are only authorized after lengthy discussions and review, and then scheduled for some obscure time on a weekend when we hope - we HOPE - no one is actually paying attention. Just in case.

Because change is bad, m'kay?

There is, believe it or not, a happy medium between the extreme control operating mode and the chaos (and ensuing entropy) introduced by change and the business' desire to change faster and more frequently. The problem isn't really change, it's uncontrolled change.

It's the tweaks and midnight patches that aren't documented.

It's the manual adding of a route or the deletion of an ACL that isn't tracked.

It's the quick change to /etc/hosts on the app server to fix communications because the network team takes too long.

Undocumented, uncontrolled change is what starts tipping the entire system toward an entropic slide toward disaster. It's the understanding of the current state of the end-to-end app infrastructure architecture and the keeping it in sync with reality that's the problem.

That's what decoupling the infrastructure (the data plane) from its known state (the control plane) can bring.

Operationalization isn't just about improving time to market, though that's a definite benefit that squarely aligns with business priorities. It's also about introducing stability into the environment; stability that comes about as a result of knowing the state of the entire infrastructure at any given time - and being able to change it safely.

If you know "this is the current state" of the infrastructure - from network to compute to storage to systems - then you have a much better chance of introducing a change without causing an issue. Because there aren't any "gotchas" hiding out there that might conflict with your change or interact with your change in a way that breaks things. It's the same reason the pharmacist and doctors want to know what medicines you might be current taking before they prescribe you something for that nasty flu you've got. Because they need to understand how the meds are going to interact and what possible side effects may occur.

You're an IT doctor; you need to know what's currently going on out there, in the architecture, so you can understand whether or not introducing a new X or Y or changing Z might interact with it. Doing so means you could introduce change faster and more frequently, because you don't have to have special times and days when those changes can be introduced, just in case. You can introduce change weekly - that's the deploy frequency metric associated with DevOps by which so many nubile, cloud-enveloped startups measure success. While the enterprise will never match it - and certainly shouldn't try - the reality is that faster and more frequently is still important, particularly given the frequency with which mobile applications are being updated within the enterprise. According to an Oracle-sponsored survey, “35 percent of midsize and large enterprise organizations update their application portfolio monthly, while an additional 34 percent update their applications quarterly. More than four-fifths (82 percent) of respondents expect those rates to increase over the next two years.”

Controlled change through the use of automation, orchestration and centralized approach to maintaining the state of the infrastructure eternal to the actual infrastructure is one of the ways in which organizations are going to be able to break out of the fear of change and instead embrace it.