Provisioning cluster of VMs with Ansible

Seeing how easy it was to provision one VM with Ansible, I can’t stop thinking: would it be as easy to deal with the whole cluster? After all, the original example I was trying to move to Ansible had three VMs: one Consul server and two worker machines. The server is ready, so adding two more machines sounds like an interesting exercise to do. So… let’s begin?

Hopefully, you just scrolled that down. As usual, vagrant up will create and provision new VM, which in our case is Consul server. However, don’t do that now – we have other VMs to make.

Step 0. Add more VMs

Oh, Vagrant. Without you I’d have to create those by mouse, clicks and lack of understanding about what I’m doing. Yet here you are.

The code to create consul-server VM already looked like a function, so making it a true function will allow me to reuse it for other cluster members. I also think that it worth removing Ansible provisioner from Vagrantfile just for now and applying the playbook manually with ansible-playbook. As a downside, we also need to add ubuntu user configuration back, otherwise the playbook won’t be able to connect to VM.

create_consul_host is a function to create a VM ready to be ansibilized (is that a word?), and I just call it three times to create three identical VMs: consul-server, consul-host-1 and consul-host-2. vagrant up will bring them to life and I don’t even need to check if they are OK. Of cause they are.

Step 1. Teach Ansible to trust

If you try to send any Ansible ad-hoc commands to newly created hosts (e.g. ansible all -i hosts -m ping) it will cowardly refuse to execute them, as it never saw those hosts before. I used to manually confirm that it’s OK to talk to other hosts, but as their number grows, we need something more effective. For instance, configuration file with appropriate option in it.

Apparently, putting ansible.cfg into current directory might solve all trust issues. Especially, if it has the following lines:

ansible.cfg

INI

1

2

[defaults]

host_key_checking=False

In my case I also had to delete few entries from ~/.ssh/known_hosts, as I already used some of IP addresses and seeing them again would made Ansible paranoid.

With the config file and three hosts running, we finally can execute something like ping and see how all of them proudly respond with pong:

Ping

Shell

1

2

3

4

5

ansible all-ihosts-mping

#consul-server | SUCCESS => {

# "changed": false,

# "ping": "pong"

#}

And they don’t. At least two worker hosts ignored the command, which is explainable, as I didn’t update inventory file and therefore Ansible has no idea about them. Oh well.

Step 2. Add new hosts to inventory file

Initial hosts file was quite trivial and copy-pasting that line two times (and changing IP addresses, of cause) would definitely did the trick.

However, the hosts will have different roles, so it makes sense to somehow reflect that in the file. Moreover, copy-pasting the same login and password three times is just silly.

Let’s organize those hosts into groups. For instance, consul-server can be the sole member of servers group, consul-host-1 and -2 will be the nodes, and both of these groups will be members of a cluster. In addition to that, we can put ssh login and pass variables to group variables section, so we don’t have to copy-paste them.

hosts

INI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

consul-serveransible_host=192.168.99.100

consul-host-1ansible_host=192.168.99.101

consul-host-2ansible_host=192.168.99.102

[servers]

consul-server

[nodes]

consul-host-[1:2]

[cluster:children]

servers

nodes

[cluster:vars]

ansible_user=ubuntu

ansible_ssh_pass=ubuntu

Looks serious. I especially like the wildcard structure in the middle – [1:2], which saves me one line of a text.

This time pinging all hosts works without a glitch:

Ping 2

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

ansible all-ihosts-mping

#consul-host-1 | SUCCESS => {

# "changed": false,

# "ping": "pong"

#}

#consul-server | SUCCESS => {

# "changed": false,

# "ping": "pong"

#}

#consul-host-2 | SUCCESS => {

# "changed": false,

# "ping": "pong"

#}

Instead of all I could use groups names and only a subset of the hosts would receive that command.

Having all configuration in place we finally can get to playbook file.

Step 3. Adapt playbook for multiple roles

We had six tasks for provisioning consul-server.

Install unzip

Install Consul

Make Consul a service

Ensure config directory exists

Deploy consul config

Ensure consul’s running

The fifth one is going to be different for the server and its nodes, as it deploys role specific configuration file, but the rest will be the same for all Consul roles. As we’re allowed to put multiple plays in a playbook, we can organize cluster provisioning into four parts:

Install Consul services on all VMs (tasks 1-4)

Deploy Consul server configuration (task 5)

Deploy Consul nodes configuration (task 5)

Start all Consul agents (task 6)

Step 3.1 Install Consul services

For this step we’ll need to do some copy-pasting. In fact, lots of it. The first play is basically the whole consul.yml we had before minus few things:

“Deploy consul config” and “Ensure consul’s running” steps are gone.

Instead of specific consul-server VM hosts section (line 1) now targets the group called cluster (the one that we defined in inventory file, remember?)

consul_server_ip is also gone, as we don’t need it at the moment.

Consul itself got an update a week ago, so I changed consul_ersion (line 4) to 0.9.3.

This leaves us with something like this:

consul.yml

YAML

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

- hosts: cluster

vars:

consul_version: 0.9.3

consul_config_dir: /etc/systemd/system/consul.d

tasks:

- name: Install unzip

apt: name=unzip state=present

become: true

# ...

- name: Ensure config directory exists

become: true

file:

path: "{{consul_config_dir}}"

state: directory

Assuming that consul-server, consul-host-1 and -2 are still running, we can install Consul on all three of them with the single command:

You might be surprised how fast this works. The secret is that Ansible’s provisioning the hosts in parallel.

Step 3.2 Configuring consul-server

It could’ve been another pure copy-paste exercise, but I think we have some improvements to make along the way.

Firstly, let’s have a look at the only provisioning task that our second play will have:

consul-server configuration task

YAML

1

2

3

4

5

- name: Deploy consul config

become: true

template:

src: init.json.j2

dest: "{{consul_config_dir}}/init.json"

init.json.j2 file name, which made perfect sense for single host provisioning, is getting unclear in multi-host configuration. Is it a server configuration or client’? server.init.json.j2 sounds like a better choice.

Then, “Deploy consul config” task uses consul_config_dir variable, which was declared in the first play and therefore has limited scope. Should I also copy it into the second one? Nah, I don’t think so. Instead, we can make it global by moving to inventory file.

hosts

INI

1

2

3

4

5

;...

[cluster:vars]

ansible_user=ubuntu

ansible_ssh_pass=ubuntu

consul_config_dir=/etc/systemd/system/consul.d

Another thing is that template file itself relied on consul_server_ip variable. I never liked that one, as it basically redeclared something already stored in inventory file. Seeing how we put consul_config_dir variable into inventory file, can we do the opposite and use something that’s already there, like ansible_host? Apparently we can, and putting ansible_host into server.init.json.j2 instead of consul_server_ip is perfect replacement for hardcoded IP address.

server.init.json.j2

JavaScript

1

2

3

4

5

6

7

8

{

"server":true,

"ui":true,

"advertise_addr":"{{ ansible_host }}",

"client_addr":"{{ ansible_host }}",

"data_dir":"/tmp/consul",

"bootstrap_expect":1

}

So this is how the second play is going to look in consul.yml:

consul.yml

YAML

1

2

3

4

5

6

7

8

9

10

11

12

- hosts: cluster

#....

- hosts: servers

tasks:

- name: Deploy consul server config

become: true

template:

src: server.init.json.j2

dest: "{{consul_config_dir}}/init.json"

In case you’ve forgotten, servers is also one of the groups we declared in inventory file.

Step 3.3 Configuring Consul agents

This is going to be interesting. Configuration for consul agents was simple, just copy one more JSON into, let’s say, client.init.json.j2, and we probably done.

client.init.json.j2

JavaScript

1

2

3

4

5

{

"advertise_addr":"{{ ansible_host }}",

"retry_join":["{{ consul_server_ip }}"],

"data_dir":"/tmp/consul"

}

We already know how ansible_host works, so this takes care of advertise_addr, but we also need to find consul_server_ip which we’ve just got rid of. So what should we do? Redeclare it again?

In fact, we don’t have to. Ever wondered what “TASK [Gathering Facts]” means in Ansible output? Apparently, it’s implicit task that collects tons of useful information about hosts we’re going to provision: environmental variables, OS details, network interfaces, etc. What’s more, that data is grouped by the same groups we declared in inventory file, so assuming regular nodes machines should know about existence of servers group, we simply can lookup the IP in that collection.

The variable with automatically collected data is called hostvars and this is how we can use it:

All magic is happening in lines 9 and 10. What we do there is declaring two variables (facts): consul_server for storing a name of the first host in servers group, and consul_server_ip, which will store the first public IP of that host. It looks a little bit complicated, but if you dump the contents of hostvars via e.g. - debug: var=hostvars task, it all will start to make the perfect sense.

Step 3.4 Starting all consul services in all VMs

This one is absolutely trivial:

consul.yml

YAML

1

2

3

4

5

6

7

8

9

10

11

12

13

- hosts: cluster

# ...

- hosts: servers

# ...

- hosts: nodes

# ...

- hosts: cluster

tasks:

- name: Ensure consul's running

become: true

service: name=consul state=started

Running the playbook one more time will light up the whole cluster and as during the last time, few moments later we could see Consul server UI at 192.168.99.100:8500. This time with two more nodes:

Step 3.5 Connecting the playbook to Vagrantfile

This is going to be a little bit tricky. As we saw in single host provisioning scenario, vagrant will create its own inventory file by default. That would’ve been convenient if we didn’t have the bits of useful information like groups and variables in our own inventory. Likely, that behavior is configurable and by using provisioner’s inventory_path we still can stick to existing inventory file.

Another issue also lies in default settings. Unlike ansible-playbook, which provisions hosts in parallel, Vagrant’s ansible provisioner will do that in series. Not only it’s slower than it could be, our sniffing for consul_server_ip actually depends on all hosts being provisioned altogether.

Again, likely for us, we can tell how many hosts should be provisioned concurrently by setting provisioner’s limit setting to "all". We’ll also need to start the provisioning when all hosts are ready. This is how I made it to work:

After this change single vagrant up on clean machine will bring up fully functional Consul cluster without need to provision it with ansible-playbbook.

Conclusion

Provisioning more than one VM with Ansible is not much harder than the single one. In fact, it feels exactly the same. Yes, there’s more text in inventory file, and playbook’s got a little big bigger, but essentially nothing’s changed. I’m especially happy with finding out how to use hostvars variables. Hardcoding IP address bothered me since the last time, and I’m glad I found the way to avoid it. Of cause, it would be better if IPs went away from inventory file as well and Vagrant itself took care of them, but let’s take one step at a time.