Dockerizing an application is the process of converting an application to run within a Docker
container. While dockerizing most applications is straight-forward, there are a few problems that
need to be worked around each time.

Two common problems that occur during dockerization are:

Making an application use environment variables when it relies on configuration files

Sending application logs to STDOUT/STDERR when it defaults to files in the container’s file system

This post introduces a new tool: dockerize that simplifies these two common dockerization issues.

A common problem when building docker images is that they can get big quickly.
A base image can be a tens to hundreds of MB in size. Installing a
few packages and running a build can easily create a image that is a 1GB or larger. If you
build an application in your container, build artifacts can stick around and end up getting deployed.

Large images are problematic when you start publishing images to a registry. More layers creates
more requests and larger layers take longer to transfer. Unfortunately, deleting things
in later layers does not actually remove them from the image due to the way AUFS layers work.

There are a few options to address this problem but this post will show you
how you can squash your images to make them smaller without requiring big changes to your
development and deployment workflow.

In a previous post, I showed a way to create an automated nginx reverse proxy for docker containers running on the same host. That setup works fine for front-end web apps, but is not ideal for backend services since they are typically spread across multiple hosts.

This post describes a solution to the backend service problem using service discovery for docker containers.

Docker is an open-source project to easily create lighweight, portable and self-sufficient containers
for applications. Docker allows you to run many isolated applications on a single host without
the weight of running virtual machines.

One of the problems with the current versions of docker is managing logs. Each container runs
a single process and the output of that process is saved by docker to a location on the host.

There are a few operational issues with this currently:

This log file grows indefinitely. Docker logs each line as a JSON message which can cause
this file to grow quickly and exceed the disk space on the host since it’s not rotated automatically.

The docker logs command returns all recorded logs each time it’s run. Any long running process
that is a little verbose can be difficult to examine.

Logs under the containers /var/log or other locations are not easily visible or accessible.

Service discovery is a key component of most distributed systems and service oriented architectures.
The problem seems simple at first: How do clients determine the IP and port for a service that
exist on multiple hosts?

Usually, you start off with some static configuration which gets you pretty far.
Things get more complicated as you start deploying more services. With a live
system, service locations can change quite frequently due to auto or manual scaling,
new deployments of services, as well as hosts failing or being replaced.

Dynamic service registration and discovery becomes much more important in these scenarios in
order to avoid service interruption.

This problem has been addressed in many different ways and is continuing to evolve. We’re going to look at some open-source or openly-discussed solutions to this problem to understand how they work. Specifically,
we’ll look at how each solution uses strong or weakly consistent storage, runtime dependencies, client
integration options and what the tradeoffs of those features might be.

We’ll start with some strongly consistent projects such as Zookeeper,
Doozer and Etcd which are typically
used as coordination services but are also used for service registries as well.

Fluentd and Logstash are two open-source projects that
focus on the problem of centralized logging. Both projects address the collection and transport
aspect of centralized logging using different approaches.

This post will walk through a sample deployment to see how each differs from the other. We’ll look
at the dependencies, features, deployment architecture and potential issues. The point is not to figure out
which one is the best, but rather to see which one would be a better fit for your environment.

In Centralized Logging, I covered a few tools that help
with the problem of centralized logging. Many of these tools address only a portion of the problem
which means you need to use several of them together to build a robust solution.

The main aspects you will need to address are: collection, transport, storage, and analysis. In some special cases, you may
also want to have an alerting capability as well.

Good indexes are an important part running a well performing application on MongoDB. MongoDB performs best
when it can keep your indexes in RAM. Reducing the size of your indexes also leads to faster queries and the
ability to manage more data with less RAM.