Tuesday, 26 January 2016

Use Vagrant to set up a Centos 7 VM in AWS EC2

Introduction

Everyone tells us that Infrastructure as Code is the way to go, right? So when I was recently
asked to set up a continuous integration service for a development project, the obvious option was an approach that
allows us to script the server setup and deployment, and put the whole lot under version control. Particularly since a
colleague had already assembled a Vagrant file that allowed him
to deploy a Jenkins server in a VM on his workstation.

However, there were a number of gotchas, which caused me a couple of days' unexpected work. I'm trying to record these
here in case anyone else benefits from my experience.

One of the requirements was to run the CI server under CentOS 7, the same as the target environment, so that all tests
would run under near-identical production-environment conditions and deployment would be very simple using a tar file
containing all the dependencies. But CentOS comes with enterprise-grade security features, which sometimes get in the
way of what you want to do. Read on...

Set up Vagrant

To use Vagrant, set up a folder in your project (e.g. called "CI") and in it, create a Vagrantfile. If you have not used
Vagrant before, please work through the quick Getting
Started exercise to familiarise yourself with the concepts.

The Vagrantfile is basically a Ruby script, so it is common practice to prefix it with:

# -*- mode: ruby -*-

# vi: set ft=ruby :

To use Vagrant with AWS, first you have to
locally install the provider and the AWS box:

The README for the Vagrant AWS provider helpfully provides a starter Vagrantfile
for you to copy and extend. However, we need to add a number of parameters to the AWS provider configuration and change
all the ones supplied.

The private key file for your instance (or cluster) will be generated for you by AWS EC2 when you launch an instance
through the EC2 management console. The recommendation is to put this in ~/.ec2 along with any other EC2 private keys.
Do not share it via version control, or it soon won't be secret any more.

The aws.tags hash allows you to set any tag values. NAME is a typical example, which allows you to identify the instance
in the EC2 management console.

The aws.subnet_id should be set the same as for any EC2 instance launched manually in the same subnet.

aws.block_device_mapping is only needed if you want to allocate a non-default volume size. For CI purposes, the default
volume size is probably too small if any significant amount of build history is to be kept.

The aws.security-groups should list one or more security groups that you have created via the EC2 management console.
Make sure that at least the SSH port and HTTP(s) are permitted inbound. I set up an nginx server as a reverse proxy on
port 80 (see below) to allow client browsers to access multiple back-end services through the standard HTTP port.

The aws.instance_type allows you to choose the size of virtual machine. You may find that t2.small is sufficient for
your needs, but if it runs out of CPU credits, it will be throttled severely (which can actually cause builds to fail
due to timeouts). While trying to perfect the Vagrantfile, however, you may wish to specify t2.micro to minimise AWS
usage charges.

Notice the "boothook.sh" reference. This is based on the answer to a commonly encountered issue with Vagrant and certain AWS AMIs. The contents of the file
are:

The Vagrantfile has a number of aws access parameters that need to be configured (shown by asterisks above). Insert the
name of your private key, which must be the one contained in the override.ssh.private_key_path
parameter. Navigate to the Identity and Access Management section of the AWS console. Create an IAM role for your
Vagrant execution and generate an access key. Then grant that user the AmazonEC2FullAccess permission. This allows
Vagrant to provision the virtual machine.

Confusingly, this role and its keys are not inserted into the Vagrantfile at all. Instead, a session token is required.
This is how to obtain it:

Download and install the Amazon Command Line Interface. On Mac OS X with Python and pip
already installed, this turned out to consist simply of a one-line command:sudo pip install awscli

Configure
the command line interface:aws configure(Leave
the default region name and default output format as "none")

Update the Vagrantfile using the session token as well as the new key and secret key returned. After 36 hours, if you
want to deploy using this Vagrantfile again, you will have to repeat the procedure.

Following the AWS provider configuration in the Vagrantfile (just above the final "end" statement in the file), specify
any further configuration steps required, e.g. synchronized folders and custom software installations (see next
section).

After this, deploying the box should be straightforward (make sure you are in the same current working directory as the
Vagrantfile):

vagrant up --provider=aws

Note the public IP and host FQDN shown for the virtual machine in the AWS console. This is the address you will need to
access your CI application from a browser. For example, if your machine FQDN is ec2-54-93-105-248.eu-central-1.compute.amazonaws.com, your Jenkins dashboard (assuming you went on to install Jenkins, as shown below) will be at http://ec2-54-93-105-248.eu-central-1.compute.amazonaws.com/jenkins/.

Continue configuring your CI server manually and add these configuration
steps to the Vagrantfile if possible.

To terminate the machine, usevagrant destroy

Synchronize Folders

Between the end of the config.vm.provider
configuration and the end of
the Vagrantfile, you can insert further configuration instructions. Configure SSH for folder synchronization using
the same parameters as above:

config.ssh.username = "centos"

config.ssh.pty = true

config.ssh.private_key_path = "~/.ec2/*****.pem"

Then specify which folders you want to synchronize to the server. Because folder synchronization precedes any shell
scripts run on the target VM, I find it best to synchronize mostly to /tmp subfolders on the target VM and then copy
or move the contents from there during the subsequent software installation. For example:

config.vm.synced_folder "./nginx", "/tmp/nginx", \

type: "rsync", create: true, owner: 'root', group: 'root'

where the local folder "nginx" contains a subfolder "default.d", which in turn contains "jenkins.conf" to specify
the reverse proxy configuration for Nginx to access the Jenkins server on port 8080 (see below). The copied folder
"default.d" is subsequently moved from /tmp/nginx to /etc/nginx once the Nginx software has been installed.

Install Software

Introduction

So-called "here documents" are a neat way to separate bits of installation script into identifiable blocks that can
be invoked from the configuration section. However, there is a "gotcha" here too - any backslashes ("\") must be
escaped ("\\"). This caught me out several times when developing sed or awk scripts in a shell and pasting them into
the Vagrantfile. (In the pieces of Vagrantfile shown in this blog post, please interpret a single backslash at the end of a line to mean a soft line wrap. Join with the following line and delete the backslash after copying! And don't insert any spaces, particularly in the middle of sed or awk scripts!)

Place your here documents one after the other directly above the Vagrant.configure(2) block.

Set up tools

The first thing is to install some tools that will be used by the subsequent installations. The time and date should
of course be set to whatever is appropriate for you.

Invocation

Set up Nginx

Note the copy command, which makes use of the folder synchronisation shown earlier.

The setsebool command is required to allow Nginx to proxy HTTP or HTTPs to local TCP sockets.

Here Document

Nginx=<<EOF

sudo yum -y install epel-release

sudo yum -y install nginx

sudo cp -r /tmp/nginx/default.d/*.conf /etc/nginx/default.d/

sudo setsebool httpd_can_network_connect 1 -P

sudo systemctl enable nginx

sudo systemctl restart nginx

EOF

Invocation

config.vm.provision "shell", inline: Nginx

Set up Jenkins

This is slightly complicated by the fact that we need Jenkins to have a URL prefix - otherwise reverse-proxying
becomes next to impossible (see the sed script below). The jobs folder is relocated to /home/jenkins and
symbolically linked, which should make it easier to upgrade Jenkins later without losing build configurations and
histories. The initial set of build configurations is stored in version control and synchronised to
/tmp/jenkins/jobs by Vagrant before this installation occurs.

The installation of plugins is separated into a second block. You may of course require a different selection. The
best way I have found to determine the name of the plugins to install is to install them manually once, while using
the "list-plugins" command before and afterwards to find which new plugin names have appeared. You can do this after
the Vagrant machine has been deployed by means of the following commands:

NB if you have already enabled security on the Jenkins instance, you must log in to the Jenkins CLI before you can list the plugins.java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \login --username **** --password ****

Invocation

Insert some other software installations between these two in order to allow Jenkins to initialise itself before
calling it via the CLI.

config.vm.provision "shell", inline: Jenkins

...

config.vm.provision "shell", inline: JenkinsPlugins

Set up NodeJS

The installation of NodeJS (or node.js) under CentOS 7 is fairly straightforward, but I needed to specify the exact
versions of node, grunt and bower in order to comply with the technical policy of the project. If you don't need to
do that, just omit the version details (e.g. sudo yum install -y nodejs).

Here Document

Node423=<<EOF

curl -sL https://rpm.nodesource.com/setup_4.x | sudo -E bash -

sudo yum install -y nodejs-4.2.3-1nodesource.el7.centos.x86_64

sudo npm install -g grunt-cli@0.1.13 bower@1.7.0

EOF

Invocation

config.vm.provision "shell", inline: Node423

Set up PostgreSQL

Here again an exact version was needed, otherwise installation could have been much more straightforward. Note the
use of double-backslashes in the awk and sed scripts.

About Me

A software practitioner since 1980, I have been involved with the British Computer Society and particularly its Software Practice Advancement specialist group for many years. I enjoy singing and listening to music (early music to rock via classical, jazz and blues)