Tuesday, November 29, 2016

I know that the Cloud Native Computing Foundation chose Prometheus as the monitoring platform of choice for Kubernetes, but in this post I'll show you how to quickly get started with graphing CPU, memory, disk and network in a Kubernetes cluster using Heapster, InfluxDB and Grafana.

The documentation in the kubernetes/heapster GitHub repo is actually pretty good. Here's what I did:

Then you can run 'kubectl cluster-info' and look for the monitoring-grafana endpoint. Since the monitoring-grafana service is of type LoadBalancer, if you run your Kubernetes cluster in AWS, the service creation will also involve the creation of an ELB. By default the ELB security group allows 80 from all, so I edited that to restrict it to some known IPs.

After a few minutes, you should see CPU and memory graphs shown in the Kubernetes dashboard. Here is an example showing pods running in the kube-system namespace:

You can also hit the Grafana endpoint and choose the Cluster or Pods dashboards. Note that if you have a namespace different from default and kube-system, you have to enter its name manually in the namespace field of the Grafana Pods dashboard. Only then you'll be able to see data corresponding to pods running in that namespace (or at least I had to jump through that hoop.)

Here is an example of graphs for the kubernetes-dashboard pod running in the kube-system namespace:

For info on how to customize the Grafana graphs, here's a good post from Deis.

Tuesday, November 22, 2016

I've been knee-deep in Kubernetes for the past few weeks and to say that I like it is an understatement. It's exhilarating to have at your fingertips a distributed platfom created by Google's massive brain power.

I'll jump right in and talk about how I installed Kubernetes in AWS and how I created various resources in Kubernetes in order to run a database-backed PHP-based web application.

Installing Kubernetes

I used the tack tool from my laptop running OSX to spin up a Kubernetes cluster in AWS. Tack uses terraform under the hood, which I liked a lot because it makes it very easy to delete all AWS resources and start from scratch while you are experimenting with it. I went with the tack defaults and spun up 3 m3.medium EC2 instances for running etcd and the Kubernetes API, the scheduler and the controller manager in an HA configuration. Tack also provisioned 3 m3.medium EC2 instances as Kubernetes workers/minions, in an EC2 auto-scaling group. Finally, tack spun up a t2.nano EC2 instance to server as a bastion host for getting access into the Kubernetes cluster. All 7 EC2 instances launched by tack run CoreOS.

Using kubectl

Tack also installs kubectl, which is the Kubernetes command-line management tool. I used kubectl to create the various Kubernetes resources needed to run my application: deployments, services, secrets, config maps, persistent volumes etc. It pays to become familiar with the syntax and arguments of kubectl.

Creating namespaces

One thing I needed to do right off the bat was to think about ways to achieve multi-tenancy in my Kubernetes cluster. This is done with namespaces. Here's my namespace.yaml file:

I'll show how you can create two types of Kubernetes persistent volumes in AWS: one based on EFS, and one based on EBS. I chose the EFS one for my web application layer, for things such as shared configuration and media files. I chose the EBS one for my database layer, to be mounted as the data volume.

First, I created an EFS share using the AWS console (although I recommend using terraform to do it automatically, but I am not there yet). I allowed the Kubernetes worker security group to access this share. I noted one of the DNS names available for it, e.g. us-west-2a.fs-c830ab1c.efs.us-west-2.amazonaws.com. I used this Kubernetes manifest to define a persistent volume (PV) based on this EFS share:

$ kubectl create -f web-pvc.yaml--namespace tenant1
Note that a PVC does not refer directly to a PV. The storage specified in the PVC is provisioned from available persistent volumes.

Instead of defining a persistent volume for the EBS volume I wanted to use for the database, I created a storage class:

$ cat db-storageclass-ebs.yamlkind: StorageClassapiVersion: storage.k8s.io/v1beta1metadata: name: db-ebsprovisioner: kubernetes.io/aws-ebsparameters: type: gp2$ kubectl create -f db-storageclass-ebs.yaml--namespace tenant1
I also created a PVC which does refer directly to the storage class name db-ebs. When the PVC is used in a pod, the underlying resource (i.e. the EBS volume in this case) will be automatically provisioned by Kubernetes.

Kubernetes also has the handy notion of ConfigMap, a resource where you can store either entire configuration files, or key/value properties that you can then use in other Kubernetes resource definitions. For example, I save the GitHub branch and commit environment variables for the code I deploy in a ConfigMap:

$ kubectl create configmap git-config --namespace tenant1 \

--from-literal=GIT_BRANCH=$GIT_BRANCH \

--from-literal=GIT_COMMIT=$GIT_COMMIT

I'll show how to use secrets and ConfigMaps in pod definitions a bit later on.

Creating an ECR image pull secret and a service account

We use AWS ECR to store our Docker images. Kubernetes can access images stored in ECR, but you need to jump through a couple of hoops to make that happen. First, you need to create a Kubernetes secret of type dockerconfigjson which encapsulates the ECR credentials in base64 format. Here's a shell script that generates a file called ecr-pull-secret.yaml:

The atomic unit of scheduling in Kubernetes is a pod. You don't usually create a pod directly (though you can, and I'll show you a case where it makes sense.) Instead, you create a deployment, which keeps track of how many pod replicas you need, and spins up the exact number of pods to fulfill your requirement. A deployment actually creates a replica set under the covers, but in general you don't deal with replica sets directly. Note that deployments are the new recommended way to create multiple pods. The old way, which is still predominant in the documentation, was to use replication controllers.

The template section specifies the elements necessary for spinning up new pods. Of particular importance are the labels, which, as we will see, are used by services to select pods that are included in a given service. The image property specifies the ECR Docker image used to spin up new containers. In my case, the image is called myapp-db and it is tagged with the tenant name tenant1. Here is the Dockerfile from which this image was generated:

Nothing out of the ordinary here. The image is based on the mysql DockerHub image, specifically version 5.6. The my.cnf is getting added in as a customization, and a db_setup.sh script is copied over so it can be run at a later time.

Some other things to note about the deployment manifest:

I made pretty heavy use of secrets and ConfigMap key/values

I also used the db-pvc-ebs Persistent Volume Claim and mounted the underlying physical resource (an EBS volume in this case) as /var/lib/mysql

I used the tenant1-dev service account, which allows the deployment to pull down the container image from ECR

I didn't specify the number of replicas I wanted, which means that 1 pod will be created (the default)

To create the deployment, I ran kubectl:

$ kubectl create -f db-deployment.yaml --record --namespace tenant1

Note that I used the --record flag, which tells Kubernetes to keep a history of the commands used to create or update that deployment. You can show this history with the kubectl rollout history command:

One thing that is different from the db deployment is the way a secret (REDIS_PASSWORD) is used as a command-line parameter for the container command. Make sure you use in this case the syntax $(VARIABLE_NAME) because that's what Kubernetes expects.

Also note the labels, which have app: myapp in common with the db deployment, but a different value for tier, redis instead of db.

My last deployment example for now is the one for the web application pods:

Note that replicas is set to 2, so that 2 pods will be launched and kept running at all times. The labels have the same common part app: myapp, but the tier is different, set to frontend. The persistent volume claim web-pvc for the underlying physical EFS volume is used to mount /var/www/html/shared over EFS.

The image used for the container is derived from a stock ubuntu:14.04 DockerHub image, with apache and php 5.6 installed on top. Something along these lines:

In Kubernetes, you are not supposed to refer to individual pods when you want to target the containers running inside them. Instead, you need to use services, which provide endpoints for accessing a set of pods based on a set of labels.

Note the selector property, which is set to app: myapp and tier: db. By specifying these labels, we make sure that only the deployments tagged with those labels will be included in this service. There is only one deployment with those 2 labels, and that is db-deployment.

The selector properties for each service are set so that the proper deployment is included in each service.

One important thing to note in the definition of the web service: its type is set to LoadBalancer. Since Kubernetes is AWS-aware, the service creation will create an actual ELB in AWS, so that the application can be accessible from the outside world. It turns out that this is not the best way to expose applications externally, since this LoadBalancer resource operates only at the TCP layer. What we need is a proper layer 7 load balancer, and in a future post I'll show how to use a Kubernetes ingress controller in conjunction with the traefik proxy to achieve that. In the mean time, here is a KubeCon presentation from Gerred Dillon on "Kubernetes Ingress: Your Router, Your Rules".

To create the services defined above, I used kubectl:

$ kubectl create -f db-service.yaml --namespace tenant1

$ kubectl create -f redis-service.yaml --namespace tenant1

$ kubectl create -f web-service.yaml --namespace tenant1

At this point, the web application can refer to the database 'host' in its configuration files by simply using the name of the database service, which is db in our example. Similarly, the web application can refer to the redis 'host' by using the name of the redis service, which is redis. The Kubernetes magic will make sure calls to db and redis are properly routed to their end destinations, which are the actual containers running those services.

Running commands inside pods with kubectl exec

Although you are not really supposed to do this in a container world, I found it useful to run a command such as loading a database from a MySQL dump file on a newly created pod. Kubernetes makes this relatively easy via the kubectl exec functionality. Here's how I did it:

where db_setup.sh downloads a sql.tar.gz file from S3 and loads it into MySQL.

A handy troubleshooting tool is to get a shell prompt inside a pod. First you get the pod name (via kubectl get pods --show-all), then you run:

$ kubectl --namespace tenant1 exec -it $POD -- bash -il

Sharing volumes across containers
One of the patterns I found useful in docker-compose files is to mount a container volume into another container, for example to check out the source code in a container volume, then mount it as /var/www/html in another container running the web application. This pattern is not extremely well supported in Kubernetes, but you can find your way around it by using init-containers.

Here's an example of creating an individual pod for the sole purpose of running a Capistrano task against the web application source code. Simply running two regular containers inside the same pod would not achieve this goal, because the order of creation for those containers is random. What we need is to force one container to start before any regular containers by declaring it to be an 'init-container'.

The logic is here is a bit convoluted. Hopefully some readers of this post will know a better way to achieve the same thing. What I am doing here is launching a container based on the myapp-web:tenant1 Docker image, which already contains the source code checked out from GitHub. This container is declared as an init-container, so it's guaranteed to run first. What it does is it mounts a special Kubernetes volume declared at the bottom of the pod manifest as an emptyDir. This means that Kubernetes will allocate some storage on the node where this pod will run. The data4capistrano container runs a command which copies the contents of the /var/www/html/current directory from the myapp-web image into this storage space mounted as /tmpfsvol inside data4capistrano. One other thing to note is that init-containers are a beta feature currently, so their declaration needs to be embedded into an annotation.

When the regular capistrano container is created inside the pod, it also mounts the same emptyDir container (which is not empty at this point, because it was populated by the init-container), this time as /var/www/html. It also mounts the shared EFS file system as /var/www/html/shared. With these volumes in place, it has all it needs in order to run Capistrano locally via the cap command. The stage, task, and target directory for Capistrano are passed via ConfigMaps values.

One thing to note is that the RestartPolicy is set to Never for this pod, because we only want to run it once and be done with it.

To run the pod, I used kubectl again:

$ kubectl create -f capistrano-pod.yaml --namespace tenant1

Creating jobs

Kubernetes also has the concept of jobs, which differ from deployments in that they run one instance of a pod and make sure it completes. Jobs are useful for one-off tasks that you want to run, or for periodic tasks such as cron commands. Here is an example of a job manifest which runs a script that uses the twig template engine under the covers in order to generate a configuration file for the web application:

The templatize.php script substitutes DBNAME, DBUSER, DBPASSWORD and REDIS_PASSWORD with the values passed in the job manifest, obtained from either Kubernetes secrets or ConfigMaps.

To create the job, I used kubectl:

$ kubectl create -f template-job.yaml --namespace tenant1

Performing rolling updates and rollbacks for Kubernetes deployments

Once your application pods are running, you'll need to update the application to a new version. Kubernetes allows you to do a rolling update of your deployments. One advantage of using deployments as opposed to the older method of using replication controllers is that the update process for deployment happens on the Kubernetes server side, and can be paused and restarted. There are a few ways of doing a rolling update for a deployment (and a recent linux.com article has a good overview as well).

a) You can modify the deployment's yaml file and change a label such as a version or a git commit, then run kubectl apply:

$ kubectl --namespace tenant1 apply -f deployment.yaml

Note from the Kubernetes documentation on updating deployments:a Deployment’s rollout is triggered if and only if the Deployment’s pod template (i.e. .spec.template) is changed, e.g. updating labels or container images of the template. Other updates, such as scaling the Deployment, will not trigger a rollout.

b) You can use kubectl set to specify a new image for the deployment containers. Example from the documentation:

$ kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1

deployment "nginx-deployment" image update

c) You can use kubectl patch to add a unique label to the deployment spec template on the fly. This is the method I've been using, with the label being set to a timestamp:

When updating a deployment, a new replica set will be created for that deployment, and the specified number of pods will be launched by that replica set, while the pods from the old replica set will be shut down. However, the old replica set itself will be preserved, allowing you to perform a rollback if needed.

If you want to roll back to a previous version, you can use kubectl rollout history to show the revisions of your deployment updates:

I should note that all these kubectl commands can be easily executed out of Jenkins pipeline scripts or shell steps. I use a Docker image to wrap kubectl and its keys so that they I don't have to install it on the Jenkins worker nodes.

And there you have it. I hope the examples I provided will shed some light on some aspects of Kubernetes that go past the 'Kubernetes 101' stage. Before I forget, here's a good overview from the official documentation on using Kubernetes in production.

I have a lot more Kubernetes things on my plate, and I hope to write blog posts on all of them. Some of these: