Tag: devops

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is scaleable, durable and distributed by design which is why it is currently one of the most popular choices when choosing a messaging broker for high throughput architectures.

One of the major differences with Kafka is the way it manages state of the consumers, this itself is distributed with the client responsible for keeping track of the messages they have consumed (this is abstracted by the high level consumer in later versions of Kafka with offsets stored in Zookeeper). In contrast to more traditional MQ messaging technologies, this inversion of control takes considerable load off the server.

The scalability, speed and resiliency properties of Kafka is why it was chosen for a project I worked on for my most recent client Sky. Our use case was for processing realtime user actions in order to provide personalised Recommendations for the NowTV end users, a popular web streaming service available on multiple platforms. We needed a reliable way to monitor our Kafka cluster to help inform key performance indictors during NFT testing.

Prometheus JMX Collector

Prometheus is our monitoring tool of choice and Apache Kafka metrics are exposed by each broker in the cluster via JMX, therefore we need a way to extract these metrics and expose them in a format suitable for Prometheus. Fortunately prometheus.io provides a custom exporter for this. The Prometheus JMX Exporter is a lightweight web service which exposes Prometheus metrics via a HTTP GET endpoint. On each request it scrapes the configured JMX server and transforms JMX mBean query results into Prometheus compatible time series data, which are then returned to the caller via HTTP.

The mBeans to scrape are controlled by a yaml configuration where you can provide a white/blacklist of metrics to extract and how to represent these in Prometheus, for example GAUGE or COUNTER. The configuration can be tuned for your specific requirements, a list of all metrics can be found in the Kafka Operations documentation. Here is what our configuration looked like:

Viewing Kafka Metrics

Once metrics have been scraped into Prometheus they can be browsed in the Prometheus UI, alternatively richer dashboards can be built using Grafana.

Prometheus Graph BuilderGrafana Dashboard

In order to try this out locally, a fully dockerised example which has been provided on GitHub – kafka-prometheus-monitoring. This project is for demonstration purposes only and is not intended to be run in a production environment. This is only scratching the surface of monitoring and fine-tuning the Kafka brokers but it is a good place to start in order to enable performance analysis of the cluster.

A note on monitoring a cluster of brokers: Prometheus metrics will include a label which denotes the Brokers IP address, this allows you to distinguish metrics per broker. Therefore a JMX exporter will need to be run for each broker and Prometheus should be configured to poll each deployed JMX exporter.

UPDATE: Since the time of writing this post, Docker has become much more mainstream. Some of the APIs have evolved and there is also native installation options for Mac available (instead of boot2docker). The docker tutorials are the best way to get started, https://docs.docker.com/get-started/. In addition docker-compose is useful for spinning up groups of containers as demonstrated in this post.

With the rise of new development methodologies such as Continuous Delivery, long gone are the days where a Software Engineer pushes code into the abyss and hope it comes out unscathed on the other side. We are seeing a shift in the industry where the traditional walls between Development, Quality Assurance and Operations are slowly being broken down, these roles are merging and we are seeing a new breed of Engineer. The buzz word “DevOps” has become prominent in the industry and as a result we are seeing project development teams that are more agile, more efficient and able to respond more quickly to change. This shift has led to a rise of new tools and frameworks to help us automate deployment, automate testing and standardise infrastructure.

One of the tools at the forefront of this transformation is Docker, Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Before diving further into this practical exercise I would suggest having a read over What is Docker?

Before beginning the exercise you will need to install Docker, I use boot2docker on MacOS, for further details on installation for your platform visit Docker Installation. Another option is to use a cloud provider to run your docker host, Digital Ocean provide Docker ready servers running on the cloud for as little as $0.007/hour, this is an especially attractive option if you are limited by bandwidth or resources.

A few basics

Docker Image

A docker image is a read-only blue-print for a container, an example blue-print may be the Ubuntu operating system, or a CentOS one. Every container that you run in Docker will be based off a docker image.

Dockerfile

A Dockerfile contains code that tells Docker how to build a Docker image. Docker images are layered and so can be extended, this allows you to stack extra functionality on top of existing base images. A commonly used base image is ubuntu:latest which is a blue-print of the base installation of an Ubuntu distribution.

Docker Container

A docker container can be thought of as a light weight self-contained instance of a virtual machine running a linux distribution (usually with modifications), they are extremely cheap to start and stop. Docker containers are spawned using a docker image, they should be considered as stateless/ephemeral resources.

Docker Hub

Docker Hub brings Software Engineering DRY principles to the system infrastructure world, it is a global repository platform that holds Dockerfiles and images. There are already images available that run ubuntu, redhat, mysql, rabbitmq, mongodb, nginx to name just a few.

Diving into Docker

Let’s dive straight into Docker, we are going to build a simple infrastructure that will host a self-contained instance of WordPress, a popular blogging tool that is used by many organisations and writers across the world. The infrastructure will include a nginx server to route/proxy requests, a WordPress application server to host the user interface and a MySQL database to provide storage. Once complete our infrastructure will look something like this:

The database container

Let’s start by creating our MySQL database container, luckily for us MySQL has already been “dockerised” and is available for us to pull via Docker Hub, the defaults are fine so there is no need to write our own Dockerfile or build any new images. A new container can be started using the docker run command.

The first run may take some time while images are downloaded, they will be cached for subsequent runs.

So what just happened here? We asked Docker to run a new container using the MySQL base image:

-name

the name/tag to assign the new container

-e

this sets environment variables for the container, in this case the password for the MySQL instance, documentation for available configuration can be found in the MySQL Docker Hub documentation

-d

this tells docker to run the container in the background as a detached process

mysql

the name of the docker image to use, this is pulled from Docker Hub

Edit: Please note that in order to maintain any data across containers, a VOLUME should be configured to ensure data stays persistent. For the sake of simplicity we will omit this flag but be aware deployments that involve state should carefully consider the durability of data across the life-cycle of containers.

The application container

Now let’s move onto running the WordPress application container, again this has already been “dockerised” and resides in the Docker Hub WordPress repository.

This tells Docker to create a network link to the wordpress-db container (which we created earlier), this makes network communications possible between the two containers. The value has two parts, the left hand side signifies the container to connect to (wordpress-db), and the right hand sign represents a hostname alias from this container (mysql)

Excellent, the wordpress-app container can talk to the wordpress-db container. Exit the bash session, if desired you can check the logs for your running containers.

docker logs wordpress-app

Great, everything is looking good so far.

The nginx container

It is fairly common for many web applications to be fronted by a HTTP web proxy. This provides advantages such as control of request routing, auditing, security, logging, caching, load balancing, hosting static content and more. Nginx is a commonly used implementation of a HTTP web proxy server. As we are creating a custom nginx we will need to create a new Dockerfile to define a new image that contains some custom nginx configuration:

You may notice we gave the argument -p 80:80, this tells Docker to expose the port 80 on the container so it can be accessed externally from the docker host machine.

Hey Presto

Now browse to http://DOCKER_HOST_IP/ in your browser and voila, WordPress is ready to go, follow the WordPress setup prompts to configure your instance, you should soon see the following page ready to go:

So to recap, we have learnt some of the fundamental concepts of Docker by making practical use of the resources available in Docker Hub to build a self-contained running instance of WordPress. All with just a few Docker commands. I hope this post will serve as a good introduction for you to start Dockerising your own applications infrastructure and to reap the many benefits that Docker brings.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter or LinkedIn. Thank you for reading!

Edit: This post is also available in Chinese, thank you to dockerone.com for the translation – 深入浅出Docker（翻译：崔婧雯校对：李颖杰）

Recent Posts

About the author

I'm Rama Nallamilli, software engineer and brazillian jujitsu practitioner based in London, UK. I have a passion for technology and modern engineering practices. My blog will discuss topics across the whole spectrum of development but Scala, DevOps and Distributed Systems are subjects I am particularly passionate about.

My previous experience includes working in engineering teams at the BBC, HMRC, Sky and Expedia. I am currently working as a Data/Software Engineer for Babylon Health, a UK leading startup in the A.I healthcare space.

I spend my spare time training in Brazilian Jujitsu and currently hold a purple belt.