Friday, September 09, 2016

For the last month or so I've been experimenting with Rancher as the orchestration layer for Docker-based deployments. I've been pretty happy with it so far. Here are some of my notes and a few tips and tricks. I also recommend reading through the very good Rancher documentation. In what follows I'll assume that the cluster management engine used by Rancher is its own engine called Cattle. Rancher also supports Kubernetes, Mesos and Docker Swarm.

Running the Rancher server

I provisioned an EC2 instance, installed Docker on it, then ran this command to launch the Rancher server as a Docker container (it will also get launched automatically if you reboot the EC2 instance):

# docker run -d --restart=always -p 8080:8080 rancher/server

Creating Rancher environments

It's important to think about the various environments you want to manage in Rancher. If you have multiple projects that you want to manage with Rancher, as well as multiple environments for your infrastructure, such as development, staging and production, I recommend you create a Rancher environment per project/infrastructure-environment combination, for example a Rancher environment called proj1dev, another one called proj1stage, another called proj1prod, and similarly for other projects: proj2dev, proj2stage, proj2prod etc.

Tip: Since all containers in the same Rancher environment can by default connect to all other containers in that Rancher environment, having a project/infrastructure-environment combination as detailed above will provide good isolation and security from one project to another, and from one infrastructure environment to another within the same project. I recommend you become familiar with Rancher environments by reading more about them in the documentation.

In what follows I'll assume the current environment is proj1dev.

Creating Rancher API key pairs

Within each environment, create an API key pair. Copy and paste the two keys (one access key and one secret access key) somewhere safe.

Adding Rancher hosts

Within each environment, you need to add Rancher hosts. They are the compute nodes that will run the various Docker containers that you will orchestrate with Rancher. In my case, I provisioned two hosts per environment as EC2 instances running Docker.

In the Rancher UI, when you go to Infrastructure -> Hosts then click the Add Host button, you should see a docker run command that you can run on each host in order to launch the Rancher Agent on that host. Something like this:

Note that you need to allow UDP ports 500 and 4500 from each Rancher host to/from any other host and to/from the Rancher server. This is because Rancher uses IPSec tunnels for inter-host communication. The Rancher hosts also need to talk to the Rancher server over port 8080 (or whatever port you have exposed for the Rancher server container).

Adding ECR registries

We use ECR as our Docker registry. Within each environment, I had to add our ECR registry. In the Rancher UI, I went to Infrastructure -> Registries, then clicked Add Registry and chose Custom as the registry type. In the attribute fields, I specified:

Address: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com

Email: none

Username: AWS

Password: the result of running these commands (you need to install and configure the awscli for this to work):

apt-get install python-pip; pip install awscli

aws configure (specify the keys for an IAM user allowed to access the ECR registry)

aws ecr get-login | cut -d ' ' -f 6

Application architecture

For this example I will consider an application composed of a Web application based on Apache/PHP running in 2 or more containers and mounting its shared files (configuration, media) over NFS. The Web app talks to a MySQL database server mounting its data files over NFS. The Web app containers are behind one or more instances of a Rancher load balancer, and the Rancher LB instances are fronted by an Amazon Elastic Load Balancer.

Rancher stacks

A 'stack' in Rancher corresponds to a set of services defined in a docker-compose YAML file. These services can also have Rancher-specific attributes (such as desired number of containers aka 'scale', health checks, etc) defined in a special rancher-compose YAML file. I'll show plenty of examples of these files in what follows. My stack naming convention will be projname-environment-stacktype, for example proj1-development-nfs, proj1--development-database etc.

Tip: Try to experiment with creating stacks in the Rancher UI, then either view or export their configurations via the stack settings button in the UI:

This was a life saver for me especially when it comes to lower-level stacks such as NFS or Rancher load balancers. Exporting the configuration will download a zip file containing two files: docker-compose.yml and rancher-compose.yml. It will save you from figuring out on your own the exact syntax you need to use in these files.

Creating an NFS stack

One of the advantages of using Rancher is that it offers an extensive catalog of services ready to be used within your infrastructure. One such service is Convoy NFS. To use it, I started out by going to the Catalog menu option in the Rancher UI, then selecting Convoy NFS. In the following screen I specified proj1-development-nfs as the stack name, as well as the NFS server's IP address and mount point.

Note that I had already set up an EC2 instance to act as an NFS server. I attached an EBS volume per project/environment. So in the example above, I exported a directory called /nfs/development/proj1.

After launching the NFS stack, you should see it in the Stacks screen in the Rancher UI. The stack will consist of 2 services, one called convoy-nfs and the other called convoy-nfs-storagepool:

Once the NFS stack is up and running, you can export its configuration as explained above.

To create or update a stack programmatically, I used the rancher-compose utility and wrapped it inside shell scripts. Here is an example of a shell script that calls rancher-compose to create an NFS stack:

Note that I passed various arguments to the rancher-compose utility. Most of them are specified as environment variables. This allows me to add the bash script to version control without worrying about credentials, secrets etc. I also use the --env-file .envvars option, which allows me to define environment variables in the .envvars file and have them interpolated by rancher-compose in the various yml files it uses.

Creating volumes using the NFS stack

One of my goals was to attach NFS-based volumes to Docker containers in my infrastructure. To do this, I needed to create volumes in Rancher. One way to do it is to go to Infrastructure -> Storage in the Rancher UI, then go to the area corresponding to the NFS stack you want and click Add Volume, giving the volume a name and a description. Doing it manually is well and good, but I wanted to do it automatically, so I used another bash script around rancher-compose together with another docker-compose file:

I used the ubuntu:14.04 Docker image and I attached two volumes, one called volMysqlData and once called volAppSharedData. The first one will be mounted on the Docker container as /var/lib/mysql and the second one will be mounted as /var/www/shared. These are arbitrary paths, since my goal was just to create the volumes as Rancher resources.

I wanted the volsetup service to run once so that the volumes get created, then stop. For that, I used the special Rancher label io.rancher.container.start_once: true

I used as the volume_driver the NFS stack proj1-development-nfs I created above. This is important, because I want these volumes to be created within this NFS stack.

I used the following commands to create and start the proj1-development-volsetup stack, then to show its logs, and finally to shut it down and remove its containers, which are not needed anymore once the volumes get created:

I haven't figured out yet how to remove a Rancher stack programmatically, so for these 'helper' type stacks I had to use the Rancher UI to delete them.

At this point, if you look in the /nfs/development/proj1 directory on the NFS server, you should see 2 directories with the same names as the volumes we created.

Creating a database stack

So far I haven't used any custom Docker images. For the database layer of my application, I will want to use a custom image which I will push to the Amazon ECR registry. I will use this image in a docker-compose file in order to set up and start the database in Rancher.

I have a customized MySQL configuration file my.cnf (in my local directory db/etc/mysql) which gets copied to the Docker image as /etc/mysql.my.cnf. I also have a db_setup.sh bash script in my local directory db/scripts which gets copied to /usr/local/bin in the Docker image. In this script I grant rights to a MySQL user used by the Web app, and I also load a MySQL dump file if it exists:

a ECRCredentials service which connects to Amazon ECR and allows the ECR image db:proj1-development to be used by the other 2 services

a db service which runs a Docker container based on the db:proj1-development ECR image, and which launches a MySQL database with the root password set to the value of the MYSQL_ROOT_PASSWORD environment variable

a dbsetup service that also runs a Docker container based on the db:proj1-development ECR image, but instead of the default command, which would run MySQL, it runs the db_setup.sh script (specified in the command directive); this service also uses environment variables specifying the database to be loaded from the SQL dump file, as well as the user and password that will get grants to that database

the dbsetup service links to the db service via the links directive

the dbsetup service is a 'run once then stop' type of service, which is why it has the label io.rancher.container.start_once: true attached

both the db and the dbsetup service will run on a Rancher host with the label 'dbsetup=proj1'; this is because we want to load the SQL dump from a file that the dbsetup service can find

we will put this file on a specific Rancher host in a directory called /dbdump/proj1, which will then be mounted by the dbsetup container as /dbdump

the db_setup.sh script will then load the SQL file called MYSQL_DUMP_FILE from the /dbdump directory

this can also work if we'd just put the SQL file in the same NFS volume as the MySQL data files, but I wanted to experiment with host labels in this case

wherever NFS volumes are used, for example for volMysqlData, the volume_driver needs to be set to the proper NFS stack, proj1-development-nfs in this case

It goes without saying that mounting the MySQL data files from NFS is a potential performance bottleneck, so you probably wouldn't do this in production. I wanted to experiment with NFS in Rancher, and the performance I've seen in development and staging for some of our projects doesn't seem too bad.

To run a Rancher stack based on this docker-compose-dbsetup.yml file, I used this bash script:

The db service is similar to the one in the docker-compose-dbsetup.yml file. In this case the database is all set up, so we don't need anything except the NFS volume to mount the MySQL data files from.

As usual, I have a bash script that calls docker-compose in order to create a stack called proj1-development-database:

At this point, the proj1-development-database stack is up and running and contains the db service running as a container on one of the Rancher hosts in the Rancher 'proj1dev' environment.

Creating a Web application stack

So far, I've been using either off-the-shelf or slightly customized Docker images. For the Web application stack I will be using more heavily customized images. The building block is a 'base' image whose Dockerfile contains directives for installing commonly used packages and for adding users.

When I built this image, I tagged it as my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/base:proj1-development.Here is the Dockerfile for an image (based on the base image above) that installs Apache, PHP 5.6 (using a custom apt repository), RVM, Ruby and the compass gem:

The heavy lifting takes place in the app_setup.sh script. That's where you would do things such as pull a specified git branch from application repo on GitHub, then run composer (if it's a PHP app) or other build tools in order to generate the artifacts necessary for running the application. At the end of this script, I generate a tar.gz of the code + any artifacts and upload it to S3 so I can use it when I generate the Docker image for the Web app.

When I built this image, I tagged it as my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/appsetup:proj1-development

To actually run a Docker container based on the appsetup image, I used this docker-compose file:

the command executed when a Docker container based on the appsetup service is launched is /usr/local/bin/app_setup.sh, as specified in the command directive

the app_setup.sh script runs commands that connect to the database, hence the need for the appsetup service to link to the MySQL database running in the proj1-development-database stack launched above; for that, I used the external_links directive

the appsetup service mounts an NFS volume (volAppShared) as /var/www/shared

the volume_driver needs to be proj1-development-nfs

before running the service, I created proper application configuration files under /nfs/development/proj1/volAppShared on the NFS server, specifying things such as the database server name (which needs to be 'db', since this is how the database container is linked as), the database name, user name and password, etc.

the appsetup service uses various environment variables referenced in the environment directive; it will pass these variables to the app_setup.sh script

To run the appsetup service, I used another bash script around the rancher-compose command:

Tip: When using its Cattle cluster management engine, Rancher does not add services linked to each other as static entries in /etc/hosts on the containers. Instead, it provides an internal DNS service so that containers in the same environment can reach each other by DNS names as long as they link to each other in docker-compose files. If you go to a shell prompt inside a container, you can ping other containers by name even from one Rancher stack to another. For example, from a web container in the proj1-development-app stack you can ping a database container in the proj1-development-database stack linked in the docker-compose file as db and you would get back a name of the type db.proj1-development-app.rancher.internal.

Tip: There is no need to expose ports from containers within the same Rancher environment. I spent many hours troubleshooting issues related to ports and making sure ports are unique across stacks, only to realize that the internal ports that the services listen on (3306 for MySQL, 80 and 443 for Apache) are reachable from the other containers in the same Rancher environment. The only ports you need exposed to the external world in the architecture I am describing are the load balancer ports, as I'll describe below.

Here is the Dockerfile for an image that runs the Web application:

FROM my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/apache-php:proj1-development

This image is based on the apache-php image but adds Apache customizations, as well as the release directory obtained from the tar.gz file uploaded to S3 by the appsetup service.

When I built this image, I tagged it as my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/app:proj1-development

Code deployment

My code deployment process is a bash script (which can be used standalone, or as part of a Jenkins job, or can be turned into a Jenkins pipeline) that first runs the appsetup service in order to generate a tar.gz of the code and artifacts, then downloads it from S3 and uses it as the local release directory to be copied into the app image. The script then pushes the app Docker image to Amazon ECR. The environment variables are either defined in an .envvars file or passed via Jenkins parameters. The script assumes that the Dockerfile for the app image is in the current directory, and that the etc directory structure used for the Apache files in the app image is also in the current directory (they are all checked into the project repository, so Jenkins will find them).

Nothing very different about this file compare to the files I've shown so far. The app service mounts the volAppShared NFS volume as /var/www/shared, and links to the MySQL database service db already running in the proj1-development-database Rancher stack, giving it the name 'db'.

To run the app service, I use this bash script wrapping rancher-compose:

Since the proj1-development-app stack may already be running with an old version of the app Docker image, I will invoke rancher-app.sh with the force-upgrade option of the rancher-compose command:./rancher-app.sh up -d --force-upgrade --confirm-upgrade --pull --batch-size "1"

This will perform a rolling upgrade of the app service, by stopping the containers for the app service one at a time (as indicated by the batch-size parameter), then pulling the latest Docker image for the app service, and finally starting each container again. Speaking of 'containers' plural, you can indicate how many containers should run at all times for the app service by adding these lines to rancher-compose.yml:

app: scale: 2

In my case, I want 2 containers to run at all times. If you stop one container from the Rancher UI, you will see another one restarted automatically by Rancher in order to preserve the value specified for the 'scale' parameter.

Creating a load balancer stack

When I started to run load balancers in Rancher, I created them via the Rancher UI. I created a new stack, then added a load balancer service to it. It took me a while to figure out that I can then export the stack configuration and generate a docker-compose file and a rancher-compose snippet I can add to my main rancher-compose.yml file.

The ports directive tell the load balancer which ports to expose externally and what ports to map them to. This example shows that port 8000 will be exposed externally and mapped to port 80 on the target service, and port 8001 will be exposed externally and mapped to port 443 on the target service.

The external_links directive tells the load balancer which service to load balance. In this example, it is the app service in the proj1-development-app stack.

The labels directive does layer 7 load balancing by allowing you to specify a domain name that you want to send to a specific port. In this example, I want to send HTTP requests coming on port 8000 for proj1.dev.mydomain.com to port 80 on the target containers for the app service, and HTTPS requests coming on port 8001 for the same proj1.dev.mydomain.com name to port 443 on the target containers.

I could have also added a new line under labels, specifying that I want requests for proj1-admin.dev.mydomain.com coming on port 8000 to be sent to adifferent port on the target containers, assuming that I had Apache configured to listen on that port. You can read more about the load balancing features available in Rancher in the documentation.

Note that there is a mention of a default_cert. This is an SSL key + cert that I uploaded to Rancher via the UI by going to Infrastructure -> Certificates and that I named proj1.dev.mydomain.com. The Rancher Catalog does contain an integration for Let's Encrypt but I haven't had a chance to test it yet (from the Rancher Catalog: "The Let's Encrypt Certificate Manager obtains a free (SAN) SSL Certificate from the Let's Encrypt CA and adds it to Rancher's certificate store. Once the certificate is created it is scheduled for auto-renewal 14-days before expiration. The renewed certificate is propagated to all applicable load balancer services.")

Note also that the scale value is 2, which means that there will be 2 containers for the lb service.

Tip: In the Rancher UI, you can open a shell into any container, or view the logs for any container by going to the Settings icon of that container, and choosing Execute Shell or View Logs:

Tip: Rancher load balancers are based on haproxy. You can open a shell into a container running for the lb service, then look at the haproxy configuration file in /etc/haproxy/haproxy.cfg. To troubleshoot haproxy issues, you can enable UDP logging in /etc/rsyslog.conf by removing the comments before the following 2 lines:

#$ModLoad imudp#$UDPServerRun 514

then restarting the rsyslog service. Then you can restart the haproxy service and inspect its log file in /var/log/haproxy.log.

A directory for the base Docker image (containing its Dockerfile and any other files that need to go into that image)

A directory for the apache-php Docker image

A directory for the db Docker image

A directory for the appsetup Docker image

A Dockerfile in the current directory for the app Docker image

An etc directory in the current directory used by the Dockerfile for the app image

Each project/environment combination has a branch created in this GitHub repository. For example, for the proj1 development environment I would create a proj1dev branch which would then contain any customizations I need for this project -- usually stack names, Docker tags, Apache configuration files under the etc directory.

My end goal was to use Jenkins to drive the launching of the Rancher services and the deployment of the code. Eventually I will use a Jenkins Pipeline to string together the various steps of the workflow, but for now I have 5 individual Jenkins jobs which all check out the proj1dev branch of the GitHub repo above. The jobs contain shell-type build steps where I actually call the various rancher bash scripts around rancher-compose. The Jenkins jobs also take parameters corresponding to the environment variables used in the docker-compose files and in the rancher bash scripts. I also use the Credentials section in Jenkins to store any secrets such as the Rancher API keys, AWS keys, S3 keys, ECR keys etc. On the Jenkins master and executor nodes I installed the rancher and rancher-compose CLI utilities (I downloaded the rancher CLI from the footer of the Rancher UI).

Job #2 runs rancher-nfssetup.sh and rancher-volsetup.sh in order to set up the NFS stack and the volumes used by the dbsetup, appsetup, db and app services.

Job #3 runs rancher-dbsetup.sh and rancher-dblaunch.sh in order to set up the database via the dbsetup service, then launch the db service.

At this point, everything is ready for deployment of the application.

Job #4 is the code deployment job. It runs the sequence of steps detailed in the Code Deployment section above.

Job #5 is the rolling upgrade job for the app service and the lb service. If those services have never been started before, they will get started. If they are already running, they will be upgraded in a rolling fashion, batch-size containers at a time as I detailed above.

When a new code release needs to be pushed to the proj1dev Rancher environment, I would just run job #4 followed by job #5. Obviously you can string these jobs together in a Jenkins Pipeline, which I intend to do next.

Some more Rancher tips and tricks

To troubleshoot Rancher infrastructure-related issues, it helps to inspect the cattle-debug and cattle-error log files on the Rancher server: