Nuts and Bolts

Everyone has opinions and thoughts. Here are some of ours.

Docker Image Layers Are Like Git Commits

Posted by Tung Nguyen
on
Sep 24, 2017

One way to think about Docker image layers is to think of them as git commits. While the two are technically different, this article uses this analogy to point out an interesting commonality between both of them.

Whenever you add a new commit to a git repository, the repository always gets larger. Even if you are removing a file with a commit, the git repository will get larger. The git repository contains all the historical changes ever made to the repo. Removing files just adds to that history.

For example, let’s say that you accidentally committed a large sql dump of your database. That database dump file consequently bloats your git repo size. In the very next commit, you remove the database dump. But since the database dump is still in the git history, anyone who initially clones the repo downloads the database dump commit also. The database dump does not disappear from git history solely because you removed it with an additional commit.

Docker image layers work similarly compared to git commits. Any new Docker image layer created with an instruction in the Dockerfile increases the size of the Docker image. Even if that instruction removes files from the filesystem - just like a git commit that removes a database dump file. Docker layering technology keeps a historical record all the Docker layers that ever existed.

You often see Dockerfile combining concise instructions to a huge one-liner for this very reason. Dockerfiles without size optimization that look like this: