The environment has a Docker installation configured, running on a host called docker. Everything else required we'll launch as containers.

Congratulations!

You've completed the scenario!

Scenario Rating

In this tutorial on Docker storage, we learn:

How docker images are stored locally by the Docker engine.
How the copy-on-write mechanism and the union file system optimize storage and start up time for Docker containers.
The variety of storage drivers compatible with Docker.
How volumes provide shared persistent data for Docker containers.
With this you can think of mounting a volume and attach deattach to and from the container and safely store your data.

Steps

Docker Storage Internals

Step 1

Before going to the tutorial, Can you think on the question, How docker stores the images internally ?

There are lot of place inside docker both at engine level and container level that use or work with storage. Let's deep dive into the storage.

Let’s imagine we want to pull a Docker image from a registry, like so:

docker pull nginx

When you hit enter, docker search for the respective image into local repository, if it's not found, docker will pull the nginx image from the Docker Hub. At Docker Hub, you can see all the images with versions and its respective Dockerfile. Once the pulling completes, it will added into your local repository and managed by docker engine.

We can verify this is the case by listing the local images:

docker images

Now if we launch nginx image, it will spin up quickly as its store locally.

We can launch it like so,

docker run -d --name web1 -p 8081:80 nginx

This command maps port 80 of the container to port 8081 of the host machine. After it has run, you can connect at port 8081 on host(docker is the host in our scenario) to verify that nginx responds. Just run the next command,

curl http://docker:8081/

This step we can easily visualize, but whats happening behind the scene, as far as this containers file system ? To understand it we need to look at the copy-on-write mechanism.

Step 2

When we launch our image, docker engine does not make full copy of already stored image. It just instantiates the container out of image i.e. it uses a technique called copy-on-write mechanism. It's a standard UNIX pattern that provides a single shared copy of data, until the data is modified.

To do this, changes between the image and the running container are tracked. Just before any write operation is performed in the running container, a copy of the file that would be modified is placed on the writable layer of the container(top most layer is writable), and that is where the write operation takes place. Hence the name, “copy-on-write”.

If this wasn’t happening, each time you launched an image, a full copy of the filesystem would have to be made. This would add time to the startup process and would end up using a lot of disk space.

Because of the copy-on-write mechanism, running containers can take less than 0.1 seconds to start up, and can occupy less than 1MB on disk. Compare this to Virtual Machines (VMs), which can take minutes and can occupy gigabytes of disk space, and you can see why Docker has seen such fast adoption.

But how is the copy-on-write mechanism implemented? To understand that, we need to take a look at the Union File System.

Step 3

The Union File System(UFS) specializes in not storing duplicate data.

If two images have identical data, that data does not have to be recorded twice on disk. Instead, you can store the data once and then use it in many locations. This is possible with something called a layer.

Each layer is a file system, and as the name suggests, they can be layered on top of each other. Crucially, single layers containing shared files can be used in many images. This allows images to be constructed and deconstructed as needed, via the composition of different file system layers.

The layers that come with an image you pull from the Docker Hub are read-only. But when you run a container, you add a new layer on top of that. And the new layer is writable.

When you write to that layer, the entire stack is searched for the file you are writing to. And if a file is found, it is first copied to the writable layer. The write operation is then performed on that layer, not the underlying layer.

This works because when reading from a UFS volume, a search is done for the file that is being read. The first file that is found, reading from top to bottom, is used. So files on the writable layer of your container are always used.

If we were to run thousands of containers based on the same base layers we reap huge benefits in both startup time and disk space.

One example setup that would benefit is a web app that horizontally scales many identical web servers. Another would be a hosting company that provides the same basic image to all customers, and then only writes the data that customers add or change.

Simply pull two different docker images of alpine:3.3 and alpine:latest and verify that the intermediate similar layers store once on the local memory.
docker pull alpine:3.3

docker pull alpine:latest

Step 4

Docker has the benefit of being a complete product (the “batteries included” model) but also providing pluggability in case you want to add things.

By default, Docker ships with a Overlay storage driver(In the older version, it ships with AUFS storage driver). However, other storage drivers are pluggable such as OverlayFS, Device Mapper, BTRFS, VFS, and ZFS. They all implement image composition and copy-on-write mechanism, among other features.

To see what storage driver your Docker engine is using, run:

docker info

Notice the Storage Driver: overlay line in this output. That means we’re using the stock OverlayFS driver.

Let’s look at the way Docker works with app generated data.

Step 5

A volume is a directory mounted inside a container that exists outside of the union file system. They are created via a Dockerfile, or the Docker CLI tool. The volume can map to an existing directory on the host machine, or remote NFS device.

The directory a volume maps to exists independently from any containers that mount it. This means you can create containers, write to volumes, and then destroy the containers again, without fear of losing any app data.

Volumes are great when you need to share data (or state) between containers, by mounting the same volume in multiple containers. Though take note: it’s important to implement locks or some other concurrent write access protection.

“Volumes are great when you need to share data (or state) between containers…”

They’re also great when you want to share data between containers and the host machines, for example accessing source code.

Another common use is of volumes is when you’re dealing with large files, such as logs or databases. That’s because writing to a volume is faster than writing to the union file system, which uses the (IO expensive) copy-on-write mechanism.

To demonstrate the power of volumes and how to use them, let’s look at two scenarios.

1) RUNNING A CONTAINER WITH A VOLUME FLAG

Launch a container with -v, the volume flag:

docker run -d -v /code -p 8080:80 --name mynginx nginx

This creates a procedural named directory (which we will look at shortly) on the host machine and then maps it to the /code directory in the container. You can see the volume has been created and mounted with this command:

docker inspect mynginx

You should see a long JSON-like output onto the terminal.

This output confirms the creation of the volume at the docker engine level as well as the mapping to the container’s /code directory. Also take note of /var/lib/docker/volumes/12f6[...]/_data, being the the volume path. We will use this path to access our data on the host machine.

Okay, next, grab a shell inside the container:

docker exec -it mynginx /bin/bash

Check the /code directory exists:

ls

Change to the /code directory:

cd code

Write something to a test file, myfile:

echo Hello > myfile

And exit the container:

exit

Cool. So we just wrote some data to a file in the volume mount inside our container. Let’s look in that directory on the host machine we saw in the docker inspect output above to see if we can find the data we wrote.

You can even run cat myfile if you want to check the contents are the same. Or additionally, you could modify the contents here and then grab a shell inside the container and check that it has been updated there.

You can come out of sudo using simple command,

exit

2) CREATE ENGINE LEVEL VOLUMES AND STORAGE FOR TRANSIENT CONTAINERS

Since Docker 1.9, it is possible to create volumes using the Docker API.

First, we launch a busybox container and mount the myvolume volume to the /data directory. Then we execute a command inside the container that writes “Hello” to the /data/myfile.txt file. After that command has run, the container is stopped.

You can modify the above command to run cat /data/myfile.txt if you want to read the data from inside the container at any point.

Then change directory to the path listed as the Mountpoint in the output from the docker volume inspect myvolume command above.

cd /var/lib/docker/volumes/myvolume/_data

And again, check the contents:

ls

You can then check that file myfile.txt present and you could read this file, write to it, and so on. And everything you do will be reflected inside the container. And vice versa. This way docker storage internals work and if you keen you can hack your storage. Also you can mount the volumes as when you want as per your requirements.

Debugging Scenarios

Help

Katacoda offerings an Interactive Learning Environment for Developers. This course uses a command line and a pre-configured sandboxed environment for you to use. Below are useful commands when working with the environment.

cd <directory>

Change directory

ls

List directory

echo 'contents' > <file>

Write contents to a file

cat <file>

Output contents of file

Vim

In the case of certain exercises you will be required to edit files or text. The best approach is with Vim. Vim has two different modes, one for entering commands (Command Mode) and the other for entering text (Insert Mode). You need to switch between these two modes based on what you want to do. The basic commands are: