Cloud #04 – Container Design

Introduction

In previous papers in this series (“Cloud”), the historical trends that led up to the Cloud environment have been discussed. As a part of that discussion, the concepts of virtualization were covered. The role of Type 1 and Type 2 Hypervisors was shown. These hypervisors are capable of supporting any type of Operating System. These layers of virtualization are illustrated in the figure below.

Virtualization Layers

The Container community has made a significant effort to differentiate “Containers” from “Virtual Machines”. While this distinction is accurate from the perspective of differentiating the Container daemon from a Type 2 Hypervisor, it is also somewhat misleading. Containers can be seen as quite analogous to Java Virtual Machines. Both of these technologies create virtual environments in which to execute software.

Both JVM and Container technologies create an environment in which embedded software can execute. Both technologies isolate the embedded software from other similar runtime instances (e.g. other JVMs or Containers). In the case of a JVM, only Java software can be embedded. In the case of a Container, any software that can execute on a Linux platform can be embedded. Both technologies rely on a underlying Operating System to support their virtual environments.

For the purposes of this paper, I am going to introduce a new terminology and identify both Java Runtime Environments (JRE) and Container daemons as “Type 3 Hypervisors”. Please note that this is not currently industry accepted terminology. It is being introduced in this paper as a way of describing the virtualization provided by Containers. The function of a Type 3 Hypervisor is to provide a virtual environment in which a process or set of processes can execute. The function of the Docker daemon (Type 3 Hypervisor) is to run Docker Containers. A Docker Container can best be thought of as a “virtual thread”.

The “virtual thread” does not virtualize CPU. The CPU has already been virtualized through the multiprogramming/multitasking capabilities of the underlying Operating System (in this case Linux). The “virtual thread” provides a virtual environment in which the thread is to execute. This allows each container to customize its environment irrespective of any other executing processes. It also allows each instance of a Container, not matter what server it is running on or the presence of other Container instances, to have an identical runtime environment.

Each Container, or “virtual thread”, perceives that it controls the entire Operating System. Containers, for example, may modify some (but not all) kernel parameters, register communication ports, and perform other OS administrative functions. A container is not aware of other processes, containers, etc. executing on the Operating System on which it itself is executing. Since each instance of a Container image is identical, each instance will try to listen on the same port, write to the same file system directories, etc. It is the Container daemon (the Type 3 Hypervisor) that allows multiple Container instances to co-exist by executing them in a virtual, rather than a physical, environment.

Container Images vs Container Instances

A Container image is stored in a either a public or private Container Repository, Docker Hub for example. The stored Container image was built using a configuration file; a dockerfile in the case of Docker. The single Container Image can thus be used on any server that has both a container engine (containerd or dockerd daemon) and has connectivity to the Repository. The “scope” of the Container image can thus be said to be Repository wide.

Container images can serve two distinct purposes. The first purpose (“Base”) a Container image can be designed for is to be used not for deployment but, rather, for use as a base upon which to build other container images. In Docker, this is a Container that would be used in a dockerfile “FROM” directive. These type of images are designed to support “build” operations as opposed to runtime deployments.

The second purpose (“Template”) a Container image can be designed for is as a template for a runtime deployment. In this case, the Container image is analogous to a Java “.class” file. The Container image is a template for building a Container instance, just as a Java “.class” file is a template for building a running instance of that Class.

Once an “instance” of the Container image is started on a Server, that instance is managed and controlled by the server’s Container daemon. The “scope” of a Container instance can thus be said to be Daemon wide. Obviously, the scope of a Container instance is much narrower than the scope of a Container image. This has a number of important design considerations in terms of what should be built into the Container image itself.

A Container instance can have a number of different “states”. Docker containers can have the following states:

Created (never executed).

Restarting.

Running.

Paused.

Exited (e.g. “stopped”).

The different Container instance states, as well as the Docker commands that move a Container instance from one state to another, are illustrated in the figure above.

Implications of Containers

Since Containers are designed to be “virtual threads” they are extremely light weight. They are simple and fast to both start and terminate. The overhead in launching a container is similar to the overhead in creating a new thread; negligible. Containers can thus seen as being relatively ephemeral. They can be dispatched quickly. They can be torn down quickly. Additional instances, via Kubernetes for example, may be started and stopped as necessary across a number of different servers.

One implication of the ephemeral nature of Container instances is that, from a network perspective, they are “mobile” threads. The container may be shutdown and restarted. It may be restarted on a different server. Note that this is also true for any containerized resources that an individual Container image may refer to. Containers must thus be designed to not depend upon a specific network location.

The implications of this loose coupling are significant and have led to the creation of additional technologies (Calico, Kubernetes, Helm, Istio, etc.), a design methodology (Twelve-Factor) and communities (Cloud Native Computing Foundation) dedicated to supporting Cloud centric design and development. While there are no new design principles required to deal with a containerized environment, there are a number of new design patterns that need to be understood and mastered. The Cloud Native Computing Foundation (CNCF) is dedicated to supporting the adoption of Cloud computing as well as to improving that infrastructure.

Twelve-Factor

The Twelve-Factor design methodology is designed to facilitate the adoption of “Cloud Native” design patterns. It combines DevOps, Cloud design patterns, and Cloud operational realities into a “checklist” (the twelve factors) to evaluate when considering a Cloud design. A future installment in this series will cover the Twelve-Factor methodology in more depth. While the checklist in Twelve-Factor design covers a number (12) of different factors to consider, these twelve factors all fall into the three major areas. These three areas at the heart of “Cloud Native” design are:

DevOps.

Availability and Capacity are both handled through horizontal scaling.

The location(s) of resources (including the application itself) are variable.

The loose coupling of a Container to a Server is true for Network, File System and External (e.g. Database) resources. The implications of this loose coupling are described in the following sections.

Design Implications – Container Contents

The first, and most basic, container design question is to determine what should actually be in the Container. There are three separate areas in which information required by a container can be placed. These three areas are:

The Container image.

The Container runtime instance.

The Cloud.

Container Image ContentInformation placed in the Container image is available for all Container instances. Note that for a Container instance, this information is not immutable. It may be overwritten by that instance. Any modifications made will only be visible to that Container instance and will not affect any other Containers.

Take, for example, a NoSQL database. A document database embedded in the Container image would be available to every instance of the container. However, each Container instance would have their own separate and distinct database. If the purpose of the database is for each Container instance to cache information local to its own processing, then this document database is properly located. In this case, embedding the database in the Container image provides a number of advantages:

A local Database instance is available to all Container instances.

The Database will always be available because it’s local.

There is no additional container startup time to create a new database.

There is no network overhead for the database because it’s local.

If, however, the database is designed to be shared by multiple Container instances, then placing the Database in the Container image would not be appropriate.

Container Instance Content Containers can execute both scripts and programs. A Container can thus both create its own local resources as well as create connections to remote resources. Any data maintained by a Container instance is local to that instance unless some kind of shared disk is used. Note that most of the data needed by a Container instance will either have been provided in the Container image or will be external and therefore located in the Cloud.

One significant exception is the data produced by the Container instance. Log files (e.g. stdout, stderr), audit logs, etc. are commonly generated by application software. Application programmers should treat these types of outputs as “streams” and simply write to a standard abstract location (e.g. stdout). Container builders need to take these streams into account.

Since there may be multiple Container instances, their multiple output streams need to be combined in a single location for management. There are multiple Open Source solutions for this, including Elastic Stack (ELK) Logstash product used in the IBM Cloud. Stream consolidation may potentially need to be performed for each persistent local output produced by the Container instance.

Cloud Content Containers will often need to use resources that exist in the Cloud. An example of this would be a common NoSQL database. Containers should treat external resources like this as also being ephemeral. Connection information to these resources should not be hardcoded into the Container image, but rather be externalized in the environment in which the Container is executed.

This externalization allows the connection information to be modified at either deployment or runtime without requiring any modifications to either the container code or the Container image. There are multiple mechanisms that can be used to accomplish this externalization. They each have their own advantages and disadvantages. Some of the possible mechanisms are:

Use Operating System environment variables.

Use Shared Disk mounted to the container (e.g. Docker Volume).

Use an external configuration data (e.g. NoSQL Key-Value database).

Design Implications – Network

While containers can behave in an ephemeral manner, their underlying environment cannot. Both Container and Host environments are dynamic, they have very different time frames. As an example, it is possible to assign a new TCP/IP address to an existing DNS entry. It make take a number of hours for this DNS change to propagate across geographical boundaries. Dynamic, in terms of Host DNS changes, means that changes are dynamic across a time span of months to years.Dynamic, in terms of Containers, means that any necessary changes to be made to a container’s environment are typically measured in sub-second time frames. When a Container image is deployed, the Container daemon (Type 3 Hypervisor) looks at the declared network requirements for a Container and maps those requirements to actually available resources. These requirements limit the types and location of changes that can be made to a network environment. Changes may be made to:

The Docker daemon itself.

The underlying local Operating System.

The network Gateway for the Host.

Docker Container Networking Model (CNM)

Docker has a design for managing the network environment of a container. This design is called the Container Networking Model (CNM) and will be covered in depth in a future installment in this series. Calico, Kubernetes and Istio extend the base networking capabilities of the Docker CNM.

The networking challenges in a containerized environment fall into several different general categories. These categories define the scope over which the challenge must be solved. The solutions in this area continue to rapidly evolve, so products and their capabilities are constantly changing. The general network challenges fall into these categories:

Communications between Container instances within the same container daemon.

Communications between Container instances across container daemons.

Incoming (Ingress) communications external to the container daemons.

Calico, Docker, Kubernetes, Istio, and other software all provide capabilities to solve these challenges. Configuring containers to solve these challenges using these software products will be covered in a future installment of this series.

The basic container networking design requirements, both for writing the software in the container as well as for building the container itself, fall into several broad categories. These categories are:

Publish the Container location for local ingress.

Publish the Container location for global ingress.

Locate external services.

Publish the Container location. A container provides a Service. If a Container’s clients cannot locate the Container, then the Container is useless. This Service is known by its public name; its URL, port, and Service Name. If the scope of the Service is to be defined by the container daemon(s), then only container services may be needed. More generally, the scope of the DNS support for the container will define the scope for which that Service is available.

From the client, DNS is used to transform the URL into an IP address. For general web or multi-cloud ingress, standard DNS capabilities must be used. The network components hosting this IP address must therefore be updated whenever a new Container instance is created or deleted. This is accomplished using Destination Network Address Translation (DNAT). Note that this capability is not provided by the container software (e.g. Docker).

Locate external Services. Services may be both discoverable and secured not through their DNS names but through a Service registry. Service Mesh products such as Istio provide these, amongst other, capabilities. Embedding a Container instance within a Service Mesh can be an important part of the container design and deployment process.

Design Implications – I/O

One common misconception regarding Containers is that data cannot be preserved in a container. Docker Containers use a Union File System (UFS). A Union File System is a combination (hence “Union”) of multiple different file systems mounted on top of one another. The base of the UFS is the file system provided by the Linux kernel. On top of that base, any additional directories or files required are defined by the Dockerfile (instructions to build a Docker image) and stored in the Container image. Finally, any changes made to the file system of a Container instance are persisted with that particular Container instance.

A Container instance can be stopped and restarted and all of the file system changes made since the Container instance was first started will be intact. Of course, if a Container instance is deleted, all of the file system updates made by that Container instance are also deleted. File system I/O can be performed in three different “locations”:

The Container instance UFS file system.

The Host file system.

Shared disk / Network Attached Storage (NAS).

By default, all Container instance writes are applied to the local container UFS. These writes can be externalized from the container UFS to either the host or a NAS location. This is accomplished in Docker through the use of “Volumes”. Docker Volumes allow data to be saved locally on the server or to Network Attached Storage. This can allow one or more Container instances to share the same file system structure.

The scope over which this information can be shared depends upon nature of the file system itself. If file data is stored on a specific host, then any Container instances requiring access to that data must be able to mount that host filesystem. A similar, but broader, restriction would apply to any NAS storage used. This means that containers using Docker Volumes are limited in where they can be deployed.

Design Implications – External Resources

As mentioned previously, the location and connection information for external resources, such as databases, should not be included in Container image. Instead, this information should be externalized and stored in the environment. As was also previously discussed, this externalization can be be performed through a number of different mechanisms. It is important to note that each separate external resource should be supplied with its own configuration information. No assumptions should be made regarding the location or co-location of external resources.

Design Implications – OSI Levels

Most, but not all, software implemented in Containers will be Open Systems Interconnection (OSI) Layer 7 (Application) software. This layer of software can be considered as an abstraction of a function. The function processes input data and produces output data. All instances of the function behave in an identical manner. This is why horizontal scaling works with Layer 7 software. Multiple instances of the software provide additional capacity and availability with unchanged function.

OSI Layers

As an example, consider a trivial Application that calculates a restaurant tip. It doesn’t matter which container instance of that application is invoked, the same tip amount will be calculated.

Now consider the implications of a third party Application that locally persists state data. A Firewall Application would be an example of OSI Level 3/4 software. If this Application was being built from the ground up as “Cloud Native”, that data would be persisted in an external data store such as a NoSQL database.

To containerize existing software, however, might require that each Container instance have different firewall rules. This could be accomplished in two ways:

Creating a separate Container Image for each separate firewall.

Using one Container Image but loading different firewall rules when started.

Both of these alternatives result in a number of distinct Container instances. These instances cannot be scaled horizontally because each instance is a “singleton” pattern. Each of these Container instances is persisting a significant amount of “state” data. This is normal functioning for this level of OSI software, but this is not the type of software that the container environment was designed to support.

The implication of all this is that multiple levels of OSI software can be supported via containers, but that not all of the advantages of a containerized environment can be leveraged by all software. The more “Cloud Native” the software design is, the more advantage there is in a containerized environment.

Summary

From the programmer’s perspective, there is little difference in developing software for containers. The biggest single difference is that Streams rather than File System are preferred for some types of outputs (such as log files). From the designer’s perspective, however, there are quite a number of different design criteria for a Cloud Native target environment. These differences include:

Designing for horizontal scaling (multiple Container instances):

Service ingress.

Log aggregation.

Monitoring.

Designing for ephemeral locations:

Design containers to be stateless themselves.

Any stateful (e.g. Session) information required should be externalized.