Docker Data Containers and Named Volumes

There are more than several informative blog posts and articles that explain Docker data management to great length, however confusion has recently been introduced with the introduction of the volume API in Docker 1.9. Managing persistent data within your containerized environment should not be difficult – thankfully with a little information, it’s not! There are recommended ways however to do it better that might set you up for less headache in the future. Let’s take a quick look at the what’s available to us now and what Docker recommends as the best path forward.

First, it also might help to catch up on Docker’s use of the union file system and why volumes are necessary in the first place. Images are a series of read-only layers that comprise that particular file system. A container is merely an instantiation of those read-only layers with a single read-write layer on top. In fact, the primary difference between an image and a container is the read-write layer. Any file changes that are made within a container are reflected as a copy of modified data from the read-only layer. The version in the read-write layer hides the underlying file, but does not remove it. When deleting a container, the read-write layer containing the changes are destroyed, and upon new container creation those changes are not reflected – gone forever! (If you’re interested in a more in-depth review, check out this webinar)

Intro Docker data volumes – the ability to persist data in an organized way outside of the container. A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data (from the Docker User Guide):

Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization. (Note that this does not apply when mounting a host directory.)

Data volumes can be shared and reused among containers.

Changes to a data volume are made directly.

Changes to a data volume will not be included when you update an image.

Data volumes persist even if the container itself is deleted.

Data volumes are designed to persist data, independent of the container’s lifecycle. Docker therefore never automatically deletes volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container.

This is excellent functionality, but how do we take advantage of it? There are three ways to create volumes, with the last being the purpose of this post. We’ve created a short tutorial to show you docker volume create examples, and end with Docker named volumes.

1. Initialize (and mount) at run-time with the -v flag:

$ docker run -P --name web -v /webapp training/webapp python app.py

This will create a new volume inside a container at /webapp. Anything written to the /webapp directory will be persisted to the host machine, available to the next container that mounts it. But where is the actual volume stored? By using docker inspect, we can find where the it lives:

This “anonymous” volume was created at /var/lib/docker/volumes/d87…05e on my host machine. This isn’t exactly the most convenient for organization purposes…

2. Using the VOLUME instruction inside a Dockerfile:

FROM ubuntu:latest
VOLUME /webapp

This has the exact same effect as using the run flag above.

3. Create using the Docker Volume API introduced in Docker 1.9.

$ docker volume create --name webapp

This created a volume that I got to name and didn’t have to attach to anything right away. Seeing the flexibility here? More on this below.

There are other options to mount volumes including specifying host paths to mount directly within a container. Read up on these to become familiar with what’s available.

Data Containers

For a long time, the best practice for persisting data in Docker containers was using a well-defined paradigm of “data containers”, as documented in the Docker User Guide. In short, a container is created (but not necessary run) with a volume mounted using the top two methods above. When creating and running a container where persistent data storage is needed, the –volumes-from flag is used signifying to mount all volumes that are currently mounted to the container specified in the flag.

Here’s what this looks like in practice using persistent data in postgres as an example:

Create a data container (using the same image to save on disk space) with a mounted volume, /dbdata.

$ docker create -v /dbdata --name dbstore training/postgres /bin/true

Then use the –volumes-from flag to mount /dbdata in a new postgres container:

$ docker run -d --volumes-from dbstore --name db1 training/postgres

db1 now has a data volume mounted at /dbstore that it inherited from container dbstore.

What is this giving you that directly mounting a volume couldn’t do in the first place? Before Docker 1.9 and the inclusion of volume management, volumes couldn’t be named and were (and still are) known as anonymous volumes. Docker assigned a unique id under /var/lib/docker/volumes that was not easily managed by sight. By creating a container specifically for mounting volumes to, this created an organizational structure whereby the data container was named and easily referenced.

Named Volumes

Introduced in Docker 1.9, named volumes provide better volume management as well as the ability to use other drivers for volume storage. They can be created as previously shown in option 3 above, but also using the same options you are familiar with:

This new ability to manage and name volumes directly from the CLI truncates any further need for use of data containers. As seen in the docker github issue 17798, best practice is to use named volumes over data containers from here on out in most, if not all, use cases.

Now go out and try Docker named volumes to Deliver Software Faster!

BoxBoat Accelerator

Learn how to best introduce Docker into your organization. Leave your name and email, and we'll get right back to you.