Learn with our tutorials and training

developerWorks provides tutorials, articles and other
technical resources to help you grow your development skills
on a wide variety of topics and products. Learn about a specific
product or take a course and get certified. So, what do you want to learn
about?

Featured products

Featured destinations

Find a community and connect

Learn from the experts and share with other developers in one of our
dev centers. Ask questions and get answers with dW answers. Search for local events
in your area. All in developerWorks communities.

Content series:

This content is part # of # in the series: Agile
DevOps

This content is part of the series:Agile
DevOps

Stay tuned for additional content in this series.

Infrastructure automation is the process of scripting environments
— from installing an operating system, to installing and
configuring servers on instances, to configuring how the instances and
software communicate with one another, and much more. By scripting
environments, you can apply the same configuration to a single node or to
thousands.

Infrastructure automation also goes by other names: configuration
management, IT management, provisioning, scripted infrastructures, system
configuration management, and many other overlapping terms. The point is
the same: you are describing your infrastructure and its configuration as
a script or set of scripts so that environments can be replicated in a
much less error-prone manner. Infrastructure automation brings agility to
both development and operations because any authorized team member can
modify the scripts while applying good development practices
— such as automated testing and versioning — to your
infrastructure.

About this series

Developers can learn a lot from operations, and operations can learn a
lot from developers. This series of articles is dedicated to exploring
the practical uses of applying an operations mindset to development,
and vice versa — and of considering software products as
holistic entities that can be delivered with more agility and
frequency than ever before.

In the past decade, several open source and commercial tools have emerged
to support infrastructure automation. The open source tools include Bcfg2,
CFEngine, Chef, and Puppet. They can be used in the cloud and in virtual
and physical environments. In this article, I'll focus on the most popular
open source infrastructure automation tools: Chef and Puppet. Although you
won't learn the intricacies of either tool, you'll get an understanding of
the similarities and differences between them, along with some
representative examples. For a more detailed example of setting up and
using an infrastructure automation tool, this article provides a companion video that shows how to run Puppet in a cloud
environment.

Traditional approaches

Not all teams are applying infrastructure automation tools —
along with its practices and patterns — so what are they doing?
Traditional approaches — which do not scale — include
configuring environments manually or writing and running combinations
of scripts that must be performed by a human. This leads to
error-prone processes that increase cycle times, preventing teams from
regularly releasing software.

Chef and Puppet both use a Ruby domain-specific language (DSL) for
scripting environments. Chef is expressed as an internal Ruby
DSL, and Puppet users primarily use its external DSL —
also written in Ruby. These tools tend to be used more often in
Linux® system automation, but they have support for Windows as well.
Puppet has a larger user base than Chef, and it offers more support for
older, outdated operating systems. With Puppet, you can set dependencies
on other tasks. Both tools are idempotent— meaning you get
the same result with the same configuration no matter how many times you
run it.

Chef

Chef has been around since 2009. It was influenced by Puppet and CFEngine.
Chef supports multiple platforms including Ubuntu, Debian, RHEL/CentOS,
Fedora, Mac OS X, Windows 7, and Windows Server. It is often described as
easier to use — particularly for Ruby developers, because
everything in Chef is defined as a Ruby script and follows a model that
developers are used to working in. Chef has a passionate user base, and
the Chef community is rapidly growing while developing cookbooks for
others to use.

How it works

Get involved

developerWorks Agile transformation provides news, discussions, and training
to help you and your organization build a foundation on agile
development principles.

In Chef, three core components interact with one another — Chef
server, nodes, and Chef workstation. Chef runs
cookbooks, which consist of recipes that perform
automated steps — called actions — on nodes, such
as installing and configuring software or adding files. The Chef server
contains configuration data for managing multiple nodes. The configuration
files and resources stored on the Chef server are pulled down by nodes
when requested. Examples of resources include file, package,
cron, and execute.

Users interact with the Chef server using Chef's command-line interface,
called Knife. Nodes can have one or more roles. A role
defines attributes (node-specific settings) and recipes for a
node and can apply them across multiple nodes. Recipes can run other
recipes. The recipes in a node, called a run list, are executed
in the order they are listed. A Chef workstation is an instance with a
local Chef repository and Knife installed on it.

Table 1 describes the core components of Chef:

Table 1. Chef components

Component

Description

Attributes

Describe node data, such as the IP address
and hostname.

Chef client

Does work on behalf of a node. A single
Chef client can run recipes for multiple nodes.

Chef Solo

Allows you to run Chef cookbooks in the
absence of a Chef server.

Cookbooks

Contain all the resources you need to
automate your infrastructure and can be shared with other Chef
users. Cookbooks typically consist of multiple
recipes.

Data bags

Contain globally available data used by
nodes and roles.

Knife

Used by system administrators to upload
configuration changes to the Chef Server. Knife is used for
communication between nodes via SSH.

Hosts that run the Chef client. The primary
features of a node, from Chef's point of view, are its
attributes and its run list. Nodes are the component to which
recipes and roles are applied.

Ohai

Detects data about your operating system. It can
be used stand-alone, but its primary purpose is to provide
node data to Chef.

Recipe

The fundamental configuration in Chef. Recipes
encapsulate collections of resources that are executed in the
order defined to configure the nodes.

Repository (Chef repository)

The place where
cookbooks, roles, configuration files, and other artifacts for
managing systems with Chef are hosted.

Resource

A cross-platform abstraction of something
you're configuring on a node. For example, users and packages
can be configured differently on different OS platforms; Chef
abstracts the complexity in doing this away from the
user.

Role

A mechanism for grouping similar features of
similar nodes.

Server (Chef server)

Centralized repository of your
server's configuration.

Examples

Listing 1 demonstrates the use of the service resource within
a recipe that's part of a Tomcat cookbook. You can see that you can use
tools like Chef to do platform-specific configuration and manage server
configuration.

Listing 2 defines the attributes for the Tomcat cookbook. In this example,
I'm defining some external ports for the Tomcat server to make available.
Other types of attributes you might see include values for directories,
options, users, and other configurations.

Listing 2. Chef
attributes

Chef extends the Ruby language — as compared to an external
DSL — to provide a model for applying configuration to many nodes
at once. Chef uses an imperative model without explicit dependency
management, so people with more of a development background tend to
gravitate toward Chef when they are scripting environments.

Puppet

Puppet has been in use since 2005. Many organizations, including Google,
Twitter, Oracle, and Rackspace, use it to manage their infrastructure.
Puppet, which tends to require a steeper learning curve than Chef,
supports a variety of Windows and *nix environments. Puppet has a large
and active user community. It has been used in thousands of organizations
with installations running tens of thousands of instances.

How it works

Puppet uses the concept of a master server — called the Puppet
master — which centralizes the configuration among
nodes and groups them together based on type. For example, if
you had a set of web servers that were all running Tomcat with a Jenkins
WAR, you'd group them together on the Puppet master. The Puppet
agent runs as a daemon on systems. This enables you to deploy
infrastructure changes to multiple nodes simultaneously. It functions the
same way as a deployment manager, but instead of deploying applications,
it deploys infrastructure changes.

Puppet includes a tool called facter. Facter holds metadata about
the system and can be used to filter among servers. For example, you can
use facter to determine a node's hostname. MCollective is a
deployment tool that integrates with Puppet. You can use MCollective to
deploy infrastructure changes across nodes.

Table 2 lists the key components of Puppet:

Table 2. Key Puppet components

Component

Description

Agent

A daemon process running on a node that collects
information about the node and sends it to the Puppet
master.

Catalog

Compilation of facts that specifies how to
configure the node.

Facts

Data about a node, sent by the node to the
Puppet master.

Manifest

Describes resources and the dependencies
among them.

Module

Groups related manifests (in a directory). For
example, a module might define how a database like MySQL gets
installed, configured, and run.

Node

A host that is managed by the Puppet master.
Nodes are defined like classes but contain the host name or
fully qualified domain name.

Puppet master

The server that manages all the Puppet
nodes.

Resource

For example, a package, file, or
service.

Examples

In the example in Listing 3, a Puppet manifest describes the packages to
install on a node. Puppet determines the best approach and order of
execution for installing these packages.

Listing 4. Puppet manifest for
httpd

Puppet employs a declarative model with explicit dependency management.
Because of this, it tends to be one of the first tool considerations by
engineers who have more of a systems administration background and are
looking to script their environments.

Infrastructure as code

In this article, you learned — through examples — that your
infrastructure no longer needs to be a manual effort uniquely applied to
individual nodes. By automating your infrastructure, you can scale it up
and down without any additional effort. Because your infrastructure is
modeled in scripts, you can version and test them just like the
application code.

In the next article, you'll learn patterns and techniques for creating
ephemeral (or transient) environments —
environments that are created and destroyed in 24 hours and embrace the
abundance mindset (that is, lack of scarcity) that comes with
Agile DevOps.

A survey of system configuration tools (Thomas Delaet, Wouter
Joosen, and Bart Vanbrabant, Proceedings of the 24th Large System
Administration Conference, 2010): This paper and presentation present a
framework for evaluating 11 open source and commercial system
configuration tools, including Chef and Puppet.