The evolution of Ops follows a path that we observe regularly in our interventions. It is through this fable that we will see the 4 stages that mark this path paved with pitfalls. Let’s see how an Ops proceeds concretely to carry out the operation “ fix_mysql ” which consists inchanging theconfiguration of MySQL on production servers.

The first age :the server craftsman

Many system administrators have an almost affective relationship with their machines. Under the name “machines”, you can alternativelytalk about physical servers, virtual machines (VMs), storage, network equipment.

It is comfortingto know one’smachines one by one, to do a tour foreach of them in the morning (ssh, consultation of the munin dashboard machine by machine) as a doctor does the tour of the rooms in a hospital. When a machine falls ill, we are at itsbedside, we practice heroic care, we meet in committee to examine the treatment to be prescribed. In short, we make a fuss of it.

Violette,queen of the herd This phenomenon we observe here has a name, we call it the “Pet” approach. You feel like you are dealing with pets you call by their name, whoyou pamperand spoil, to whom you put a little knot on the head to prevent his hair from falling back on his eyes. You take care of your machines with all the professionalism, empathy and humanity which you are capable of. It’s what makes you feel good, the concernof jobwell done, the attention to detail so that the machine “feels good” and that its users obtainthe service they expect. You are a craftsman whose masterpieces are servers and your tools: a terminal, ssh, vi, tmux or screen , some scripts and configuration files with the greatest of care.

Organisations that are centered on this behavior are detectable in their propensity to personify the machines, to set upmeetings whose objective is to clean up the server “daniela”, to carry out the version upgrade of the DBMS of acme-dc1 -lnx-db001-prd, to plan a weekend for the upgrade of the machine kernel of site B. A whole poem.

Behind these cabalistic signs, a very manual, laborious and repetitive work on each of the serversis hiding. In many steps errors can slip and lead to dysfunctional and / or heterogeneous environments.

First age : comfort level
Why it works

My servers don’t needto be frequently upgraded

A few applicationsto manage

A few deployments

Small number ofmachines

Why it does not work

New versions ofapplications must be deployed more than once a month

I manage more thanX machines

I manage more thanY applications

I go on holidays or I fell sick

The second age : virtualisationrocks the boat

With the advent of virtualization, the situation changes. We tend to use many VMs. Heaps. So that knowing them all by their name becomes almost impossible. So how do you maintain an ability to deal with so many machines with the same love and dedication?

In addition, VMs are managed as physical servers (created at the beginning of the project, deleted at the end), cleaningphases are not frequent. And wemust admitit, removal of VMs is clearly not a priority. We will do this when we have time or when the hypervisor is full.

The “Pet” approach is jeopardised. Systematic tasks performed by hundreds appear meaningless, in addition to being frankly tedious. You can not hold such a rhythm durably without risking a complete overheat.

You then launchvarious actions: creation of snapshots / clones to duplicate the machines, you only swear by your golden-image . For settings specific to each machine, it’s still hell.

In this example, you start writing automation codeto switch to all machines. Little control, high risk of mistakes, you like to live dangerously … But after all, your scripts and your one-liners, you know them by heart, no matter if nobody understands anything.

Second age : comfort level
Why it works

My images ensurethat what worked yesterday will continue to work tomorrow

Why it does not work

Sometimes I forget to backport the patch X fromprod to an image.

There are so many images that we always end up taking the last one, and we hope that it is the most up to date.

Deploying without downtime is hard: my pictures take a long time to boot, you have to keepa close eye on them, and I have to update the loadbalancers configuration manually.

My shells are sometimes a bit unstable and complicated to maintain

The thirdage : infrastructure as codeto the rescue
Automation is now a must. Following the