Links

Licence

Today I am very pleased to release something I’ve been thinking about for years and actively working on since August.

After many POCs and thrown away attempts at this over the years I am finally releasing a Playbook system that lets you run work flows on your MCollective network – it can integrate with a near endless set of remote services in addition to your MCollective to create a multi service playbook system.

This is a early release with only a few integrations but I think it’s already useful and I’m looking for feedback and integrations to build this into something really powerful for the Puppet eco system.

The full docs can be found on the Choria Website, but below you can get some details.

Overview

Today playbooks are basic YAML files. Eventually I envision a Service to execute playbooks on your behalf, but today you just run them in your shell, so they are pure data.

Playbooks have a basic flow that is more or less like this:

Discover named Node Sets

Validate the named Node Sets meet expectations such as reachability and versions of software available on them

Run a pre_book task list that lets you do prep work

Run the main tasks task list where you do your work, around every task certain hook lists can be run

Run either the on_success or on_fail task list for notification of Slacks etc

Run the post_book task list for cleanups etc

Today a task can be a MCollective request, a shell script or a Slack notification. I imagine this list will grow huge, I am thinking you will want to ping webhooks, or interact with Razor to provision machines and wait for them to finish building, run Terraform or make EC2 API requests. This list of potential integrations is endless and you can use any task in any of the above task lists.

A Node Set is simply a named set of nodes, in MCollective that would be certnames of nodes but the playbook system itself is not limited to that. Today Node Sets can be resolved from MCollective Discovery, PQL Queries (PuppetDB), YAML files with groups of nodes in them or a shell command. Again the list of integrations that make sense here is huge. I imagine querying PE or Foreman for node groups, querying etcd or Consul for service members. Talking to random REST services that return node lists or DB queries. Imagine using Terraform outputs as Node Set sources or EC2 API queries.

In cases where you wish to manage nodes via MCollective but you are using a cached discovery source you can ask node sets to be tested for reachability over MCollective. And node sets that need certain MCollective agents can express this desire as SemVer version ranges and the valid network state will be asserted before any playbook is run.

Playbooks do not have a pseudo programming language in them though I am not against the idea. I do not anticipate YAML to be the end format of playbooks but it’s good enough for today.

Example

I’ll show an example here of what I think you will be able to achieve using these Playbooks.

Here we have a web stack and we want to do Blue/Green deploys against it, sub clusters have a fact cluster. The deploy process for a cluster is:

Gather input from the user such as cluster to deploy and revision of the app to deploy

Discover the Haproxy node using Node Set discovery from PQL queries

Discover the Web Servers in a particular cluster using Node Set discovery from PQL queries

Verify the Haproxy nodes and Web Servers are reachable and running the versions of agents we need

Upgrade the specific web tier using:

Tell the ops room on slack we are about to upgrade the cluster

Disable puppet on the webservers

Wait for any running puppet runs to stop

Disable the nodes on a particular haproxy backend

Upgrade the apps on the servers using appmgr#upgrade to the input revision

Do up to 10 NRPE checks post upgrade with 30 seconds between checks to ensure the load average is GREEN, you’d use a better check here something app specific

Enable the nodes in haproxy once NRPE checks pass

Fetch and display the status of the deployed app – like what version is there now

Enable Puppet

Should the task list all FAIL we run these tasks:

Call a webhook on AWS Lambda

Tell the ops room on slack

Run a whole other playbook called deploy_failure_handler with the same parameters

Should the task list PASS we run these tasks:

Call a webhook on AWS Lambda

Tell the ops room on slack

This example and sample playbooks etc can be found on the Choria Site.

Status

Above is the eventual goal. Today the major missing piece here that I think MCollective needs to be extended with the ability for Agent plugins to deliver a Macro plugin. A macro might be something like Puppet.wait_till_idle(:timeout => 600), this would be something you call after disabling the nodes and you want to be sure Puppet is making no more changes, you can see the workflow above needs this.

There is no such Macros today, I will add a stop gap solution as a task that waits for a certain condition but adding Macros to MCollective is high on my todo list.

Other than that it works, there is no web service yet so you run them from the CLI and the integrations listed above is all that exist, they are quite easy to write so hoping some early adopters will either give me ideas or send PRs!