A Practical Introduction to Docker Container Terminology

Background

When discussing an architecture for containerization, it’s important to have a solid grasp on the related vocabulary. One of the challenges people have is that many of the following terms are used interchangeably… often causing quite a bit of confusion for newcomers.

Container

Image

Container Image

Image Layer

Index

Registry

Repository

Tag

Base Image

Platform Image

Layer

The goal of this article is to clarify these terms, so that we can speak the same language and develop solutions and architectures leveraging the value of containers. Note that I am going to assume that you know how to run basic docker commands, but if you need a primer, I recommend starting with: A Practical Introduction to Docker Containers.

Vocabulary

Repository

When using the Docker command, a repository is what is specified on the command line, not an image. In the following command, “rhel7” is the repository.

docker pull rhel7

This is actually expanded automatically to:

docker pull registry.access.redhat.com/rhel7:latest

This can be confusing, and many people refer to this as an image or a container image. In fact, the docker images sub-command is what is used to list the locally available repositories. Conceptually, these repositories can be thought about as container images, but it’s important to realize that these repositories are actually made up of layers.

When we specify the repository on the command line, the Docker daemon is doing some extra work for you. The Docker daemon (not the client tool) is configured with a list of servers to search. In our example above, the damone will search for the “rhel7” repository on each of the configured servers.

In the above command, only the repository name was specified, but it’s also possible to specify a full URL with the Docker client. To highlight this, let’s start with dissecting a full URL.

Another way you will often see this specified is:

REGISTRY/NAMESPACE/REPOSITORY[:TAG]

The full URL is made up of a standard server name, a namespace, and optionally a tag. There are actually many permutations of how to specify a URL and as you explore the Docker ecosystem, you will find that many pieces are optional. The following commands are all valid and all pull some permutation of the same repository:

Namespace

A namespace is a tool for separating groups of repositories. On the public DockerHub, the namespace is typically the username of the person sharing the image, but can also be a group name, or a logical name.

Red Hat uses the namespace to separate groups of repositories based on products listed on the Red Hat Federated Registry server. Here are some example results returned by registry.access.redhat.com. Notice, the last result is actually listed on other registry server. This is because Red Hat works to also list repositories on our partner’s registry serves:

Notice, that sometimes the full URL does not need specified. In this case, there is a default repository for a given namespace. If a user only specifies the fedora namespace, the latest tag from the default repository will be pulled to the local server.

docker pull fedora

Image Layer

Repositories are often referred to as images or container images, but actually they are made up of one or more layers. Image layers in a repository are connected together in a parent-child relationship. Each image layer represents changes between itself and the parent layer.

Below, we are going to inspect the layers of a repository on the local container host. First let’s check out what image layers are available in the Red Hat Enterprise Linux 7 repository. Notice that each layer has tag and a Universally Unique Identifier (UUID).

Since, Docker 1.7, there is no native tooling to inspect image layers in a local cache, but with the help of a tool called dockviz, you can quickly inspect all of the layers in a local repository. The following command will returned shortened versions of the UUID that are typically unique enough to work with on a single machine. If you need to the full UUID, use the –no-trunc option.

Notice, that the “docker.io/registry” repositorie is actually made up of many images layers. More importantly, notice that a user could potentially “run” a container based off of any one of these layers. The following command is perfectly valid, though not guaranteed to have been test or work:

docker run -it 45b3c59b9130 bash

This is because when the image builder creates a new image, a new layer is created under certain condition. First, if the image builder is building the image manually, each “commit” creates a new layer. If the image builder is building an image with a Dockerfile, each directive in the file creates a new layer. It is useful to have visibility into what has changed in a container repository between each layer.

Simply put, a base image is an image that has no parent layer. Typically, a base image contains a fresh copy of an operating system. Base images normally include the tools (yum, rpm, apt-get) necessary to install packages or update the image included in them.

These special base images can be created yourself, but are typically produced and published by open source projects and vendors like Red Hat. Provenance and trust of these base images is critical.

The sole purpose of a base image is to provide a starting place for creating your derivative images. When using a Dockerfile, the choice of which base image you are using is explicit:

FROM rhel7

Tag

Even though a user can run a container from any of the image layers, they shouldn’t necessarily do that. When an image builder creates a new repository, they will typically label the best image layers to use. These are called tags and typically map to versions of software contained in the repository.

To remotely view the available tags available in a repository, run the following command (the jq utility makes the output a lot more readable):

To pull all of the available tags to the local container host and then inspect them, run the following commands. Notice that each of the tags maps to a version of RHEL embedded in the particular layer. Understanding this, can help you pull the desired layer to, for example, meet an OS requirement.

Registry Server

A registry server, is essentially a fancy file server that is used store Docker repositories. Typically, the registry server is specified as a normal DNS name and optionally a port number to connect to. Much of the value in the Docker ecosystem comes from the ability to push and pull repositories from registry servers.

When a Docker daemon does not have a locally cached copy of a repository, it will automatically pull it from a registry server. By default, Red Hat enterprise Linux is configured to pull repositories from registry.access.redhat.com first, then it will try the docker.io (Docker Hub).

It is important to stress, that there is implicit trust in the registry server. You must determine how much you trust the content provided by the registry and you may want to allow or block certain registries. In addition to security, there are other concerns such as users having access to licensed software and compliance issues. The simplicity with which Docker allows users to pull software makes it critical that you trust upstream content.

In Red Hat Enterprise Linux, the default docker registry is configurable. Specific registry servers can be added or blocked in RHEL7 and RHEL7 Atomic by modifying the configuration file:

vi /etc/sysconfig/docker

In RHEL7 and RHEL 7 Atomic, Red Hat’s registry server is configured out of the box:

ADD_REGISTRY='--add-registry registry.access.redhat.com'

As a matter of security, it may be useful to block public Docker repositories such as DockerHub:

# BLOCK_REGISTRY='--block-registry'

Container Host

Once an image (aka repository) is pulled from a registry server, to the local container host, it is said to be in the local cache.

Determining which repositories are synchronized to the local cache can be determined with the following command:

Graph Driver

Every time a container is created on a container host, all of the dependent image layers are used together read only. Another read/write layer is then added so that you may write data like a normal process. The graph driver is the piece of software that maps the different image layers in the repository to the local storage. The local storage can be a filesystem, or block storage depending on the driver. Drivers include: aufs, devicemapper, btrfs, zfs, and overlayfs. Determining which graph driver you are using can be done with the docker info command:

Conclusion

People often use the words container, image, container image and repository interchangeably and the docker sub-commands don’t make a distinction between an image and a repository. The commands are quite easy to use, but once architecture discussions start, it’s important to understand that a repository is really the central data structure.

It’s also quite easy to misunderstand the difference between a namespace, repository, image layer, and tag. Each of these has an architectural purpose. While different vendors, and users are using them for different purposes, they are tools in our toolbox.

The goal of this article is to leave you with the ability to command this nomenclature so that more sophisticated architectures can be created. For example, imagine that you have just been charged with building an infrastructure that limits, based on role, which namespaces, repositories, and even which image layers and tags can be pushed and pulled from based on business rules….