This blog comments on a variety of technology news, trends, and products and how they connect. I'm in Red Hat's cloud product strategy group in my day job although I cover a broader set of topics here. This is a personal blog; the opinions are mine alone.

Friday, February 20, 2015

We take things up a level from the prior podcasts about container management in this series to discuss the goals of configuration management and how things change (and don't) with containers, what's the meaning of state, promise theory, and containerized operating systems such as Project Atomic.

Gordon
Haff: Hi, everyone. This is Gordon Haff with another disk of Cloudy
Chat podcast, here once again with my colleague, Mark Lamourine.

For
today, we're going to take things up level and talk about what is configuration
management at a conceptual level and some of the ways that configuration
management is changing in a cloud and containerized world.

One
of the reasons this is an interesting topic today, and it really is an
interesting topic--I was at configuration management camp in Ghent Belgium a
couple of weeks ago. The event sold out. It was absolutely packed.

All
the different configuration management systems out there, enormous amount of
interest in this space. The reason there is so much interest is because what
has been classic configuration management is changing in pretty fundamental
ways.

Mark,
you've got a lot of experience as a system admin. You're very familiar with
classic or conventional configuration management systems. Maybe a good way to
start would be to talk from your perspective as a system admin what is or was
configuration management classically.

Mark
Lamourine: It's going to stay. I don't think that's changing. It's
finding new places to be applicable. Most people, when they talk about
configuration management, they talk about managing the configuration of
individual hosts as a bigger system.

Allowing
you to create either a portion or a complete enterprise specification for how
all of your machines should be configured and then defining that specification
and then using the configuration management system to realize that.

You
make it so that each machine, as it comes up, joins your configuration
management system. Then the processes run on the box to make it fit, to make it
configured like your definition, your specification of what that machine should
be.

Of
the elements of this usually are one controversial one, is whether there's an
agent running on each host that listens for changes and there are discussions
about whether this is a good or a bad thing and what to do about it.

But
the other big thing is that there is some global state definition for what the
larger system, the group of hosts should look like and how it should behave.

Gordon:
This gets into a lot of the, again, classic thinking about systems in
general, certainly in a pre‑cloud world.

This
really applied, whether we're talking physical servers or virtualized servers,
that there is some correct state that everything is not only driven toward to
start with, but is constantly monitored to keep in tune with that correct
truth, if you would.

Mark:
It started out when I was the young cub sysadmin, we'd go and we had a
set of manual procedures that started out as things in our head set the
network, set resolv.conf, set the host name, make sure time services were running.

It
would start off when you only had a short list of these things that you would
do and then hand it over, it wasn't really a big deal. You'd go to each one,
you install the machine, you'd spend 15 minutes making it fit into your network
and then you'd hand it off to some developer or user.

Over
time, we realized that we were doing an awful lot of this and we were hiring
lots of people to do this, so we need to write scripts to do it. Eventually,
people started writing configuration management systems, starting with Mark
Burgess and CFengine. That was the origin of that.

There
were a number of that during that time. Then CFengine and Puppet became the
defacto ones for a while that, as everyone knows, that's changing a lot now.
The idea was that we were doing these tasks manually and then when we stop, we
started automating them, we were automating them in a custom way.

These
people recognize the patterns and said, "We can do this. There's a pattern
here that we can automate, that we can take one step higher." That led to
these various systems which would make your machines work a certain way. The
specifications we had, the settings we had were fairly static. That made a lot
of sense.

Gordon:
One of the things that's probably worth mentioning, and this gets into
this "pets" versus cattle or models of state of systems is that
because these systems were pets, i.e, you didn't shut it down and stored it up
with a clean version of the same thing.

Really,
you try to keep the running system running properly. One of the traditional
jobs of configuration management was to take care of things like drift. As
these systems change, again, bring them back to the correct state of truth.

Mark:
In some sense, the pets versus cattle model, was that way of thinking was
enabled by the invention of configuration management systems. People look at it
the other way now. When things were pets, it was because they had to be.

The
rate of change was slow enough that drift was less of an important thing than
just not having to send someone to spend an hour to bring a new machine online.

The
fact that you could use these things to prevent drift or to drive change over
large groups of systems, I think that was a side effect and something that
people realized after they started using the tools to stop doing manual labor.

The
cattle versus pets distinction is one that was enabled when all of the sudden,
you realize...We use to measure the difficulty of working in an environment by
the number of machines per administrator.

When
I was first starting, it's like 10 to 1 or 15 to 1 was a good ratio because of
the amount of manual labor that went into it.

Then
with the start of CM systems, 100 to 1 or 200 to 1 in data center environments
was a good ratio. Now, you don't even look at that anymore? Why would you?
Because you've got thousands of VMs.

You
get a system like OpenStack or Amazon, you don't even look at the ratio of
hosts to sysadmins anymore. It's become irrelevant. It's become irrelevant
because these systems made cattle versus pets possible.

Gordon:
You mentioned Mark Burgess You mentioned this idea of state. Let's talk a
little bit more about this. How do you think both state as we move to these
containerized cloud‑type systems?

Mark:
I'm confused a little bit. We're finding how this older idea, which made
a lot of sense when the machines changed very slowly or relatively slowly, how
does that fit when the machines are changing?

The
case of a small enterprise, it might be tens of machines started and stopped
per day, or hundreds, to something like Amazon or OpenStack, where it's
thousands, maybe even thousands per minute. I don't know.

I've
seen numbers from Google where they have thousands of machines starts and stops
per minute over the entire world. Maybe even that's the wrong scale. The
original idea was something where you had something that was essentially
stable.

Your
machines didn't change. When they changed, it was because you changed them.
Again, you had users, who are these other people.

The
idea of state made a lot of sense in that context. The idea of a state is
static. That's the root of the word. Life has become much more dynamic. We
expect change. We expect drift. We expect that our definition of what is
correct changes. It changes faster than we can apply it to the machines we
have.

We've
gone from this idea where I could define a state and the machines would settle
on that state, and then using the configuration management system, and then
would come along later and we'd tweet the state.

We'd
update some packages or we'd change some specification or we add or remove a
user to a point where you almost never expect it to settle, you never expect to
reach the state that you've defined as your correct state.

You
change things gradually or determine eventual consistency. Things will
eventually get there, but we're changing the state now so fast that in some
senses, if you have this single central state.

You're
never going to achieve consistency across the entire system before you change
the state again. In that sense, I start wondering whether this state really
make sense.

Gordon:
What replaces it?

Mark:
This is where Mark Burgess, some of his work over the last couple of
decades, is starting to come into its own. He's a proponent of something called
promise theory.

Whether
or not the theory holds, there is a kernel of an idea that's really, really
important there, which is that...He says this is impossible. He's thinking,
this becomes so complex at so many different scales that reaching that state,
or sometimes even defining that state, doesn't make sense.

He
wants to flip the state definition on its side or upside down. He wants to say,
"Let's treat all these things locally. Let's figure out what the little
tiny piece is."

The
old way would be to say, "I eventually reached some state." What he's
saying is that the new piece, you teach it some promises. I promise I will be
on the net. I promise that I will serve web pages. I promise that I will take
files from a certain location.

You
define the promises well down in the scaling. You try and define a system based
on, "If all these things fulfill their promises, then some desired
behavior will come about at a much higher scale." I'm not yet convinced
that this is an engineering model.

This
is one of the things that I've talked to you about it and I've talked to a
couple of other people about it, that this is a great idea, I like this. What I
don't know is how to do engineering with it yet.

We'll
see whether or not there are people who are ignoring the state using...Some of
the newer configuration management systems, some of them have state built in
like Salt does. Ansible really doesn't. Ansible really is more about applying
changes to something than reaching a certain state.

There
is fuzziness in all of this, whether or not when it's true or not. People are
starting to recognize that this is a problem, and people are starting to find
ways to define the behaviors of the system without necessarily defining the low
level states one piece at a time.

Gordon:
That's probably a pretty good segue to bring this particular podcast
home. As I mentioned, that was a config management camp again a couple of weeks
ago, huge amount of interest in Chef, in Puppet, in Salt, in Ansible, in
Foreman, in CFengine.

Maybe
we could close this out with some comments about some of the different
approaches being taken here and some of your thoughts and some of these
different tools.

Mark:
The first thing I want to say with respect to that is that while I
describe this fast‑moving dynamic environment, there are lots of companies that
are still and will continue to run in a more conventional environment for a
long time.

I'm
not saying, that these configuration management systems are, in any sense,
obsolete. They still have a place, because the environments that they are
designed for still exist.

That
said, there are several different things that seem to happen. One is push
versus pull model. You get systems like Puppet, which are strong push model.
You get something like CFEngine, which uses a strong pull model.

In
both cases, they have had to create feedback mechanisms, which really are the
other one, which leads me to believe that push versus pull is probably a straw
man, that there probably have to be feedback loops in both directions
regardless of which emphasis you take.

Then
you get the agent version versus agentless discussion. There are people who
would say, "Adding this new thing that runs on each host that listens for
changes is an overhead, which isn't really necessary." The strongest
proponent of an agentless system that I've heard of is Ansible.

Ansible
uses SSH, which is in some senses, its agent. Then the SSH login triggers some
Ansible behavior on the host. Again, I this is a muddy distinction.

But
it's fair that this additional agent doesn't run in Ansible's case but also
Ansible, it seems to me, defines more the means of creating the state while
ignoring the state engine itself. I'm probably going to get hate mail and
corrections for that. Corrections are welcome, hate mail, not so much.

These
are the distinctions that are there now. There are still people now who are
looking at the cloud environment, and they're looking at these configuration
management systems in trying to figure out how to use them. They're still
trying to apply them in the same way. I'm a little suspicious of that as well.

I'm
interested in seeing how configuration management systems get used in an Atomic
environment [Red Hat Enterprise Linux Atomic Host] or in a CoreOS environment
or a minimalized operating system environment, where the whole point of that is
to eliminate the need for this configuration management and where they move the
configuration out to the containers.

Put
a container here, put another container here, make the containers work
together, that's what the configuration management system would have done. Now
we got orchestration systems doing that for the most part.

I'm
interesting in seeing how this evolves, whether their conventional system
administration systems, how they fit and how people end up using them and
whether or not they turn out to be more or less useful than they would be in a
conventional environment.

Gordon:
If someone wants to learn some more about this stuff, what do you
recommend?

Mark:
First is to look at the various configuration management systems, largely
avoid the hype. There are people who are advocates who are not somuch pundits. I'm skeptical of people who will say,
"This is the right way. This is the best way." If you wanted to learn
about promise theory, certainly Mark Burgess's books are on that. Mark's the
only person I know who is publishing in an academic sense.

This
is one of the things I'm personally interested as system administration
something worth of academic study. Mark is the only person I know who's doing
that in publishing.

About Me

I'm technology evangelist for Red Hat, the leading provider of commercial open source software. I'm a frequent speaker at customer and industry events. I also write extensively on and develop strategy for Red Hat’s hybrid cloud portfolio.

Prior to Red Hat, as an IT industry analyst, I wrote hundreds of research notes, was frequently quoted in publications such as The New York Times on a wide range of IT topics, and advised clients on product and marketing strategies. Among other hobbies, I do a lot of photography and enjoy the outdoors.