Total Pageviews

Thursday, 25 August 2016

reduce your docker image size

When architecting Docker applications, keeping your images as lightweight as possible has a lot of practical benefits. It makes things faster, more portable, and less prone to breaks. Lightweight images also make it easier to use services like Jet, Codeship’s Docker CI/CD platform; they’re less likely to present complex problems that are hard to troubleshoot, and it takes less time to share them between builds.

With that in mind, let’s talk about some great ways to streamline your Docker images and keep them as small as possible. Thanks to everyone on the Codeship team and our friends who contributed these tips!

Use as few layers as required

Generally, the fewer the layers, the simpler the Dockerfile. This means you should combine related commands as much as you can, but of course don’t try to combine unrelated commands just for the sake of producing a tiny image. Especially if you’re new to Docker.

It’s better to break up layers when adding files (to increase granularity and cacheability) but to combine layers when running related commands. For instance, run apt get update && apt-get install so that a dependent command is always executed with the latest version of the parent command and so that any cleanup from a command is done within the same layer.

There are a few tools to inspect the composition of your Docker images to see which layers might be contributing to bloat. Try looking at your image’s layers on MicroBadger.

Clean up right away

When running commands, execute the simplest chain of commands possible to get things working. Plus, as I mentioned earlier, always try to clean up in the same layer where you run your original commands. It’s very common to download an archive, extract it, and then edit the file or move it into place — and then forget to remove the original archive afterward.

Deleting these kinds of files, as well as other temporary logs and cache directories, can reduce a lot of space in your final image. However, if you don’t do this within the same Dockerfile line, the deleted file will still exist in a previous image layer. This type of cleanup applies to any interactions on the filesystem during a Docker image build.

For example:

RUN wget http://mysite.com/app &&tar-xzf app.tar.gz &&rm app.tar.gz

Instead of:

RUN wget http://mysite.com/app
RUN tar-xzf app.tar.gz
RUN rm app.tar.gz

Use a stripped-down base image

Using an appropriate base image for your project can make a huge difference. It makes sense to use the official Ruby image for your Rails project, but if you’re just executing a binary in a container, do you need the full Ubuntu image? Maybe you can use a smaller image, like Alpine, or even just run a Scratch container?

One possibility would be to create a custom base image with only the components you need. Don’t be afraid to take stripped down images and add just your required components. Sometimes a smaller image customized to your needs is preferable to an out-of-the-box image.

Use the right image for the right service

Wherever it makes sense, go ahead and share images between services. But if one service needs Ruby but not Rails, while all your other services require Rails, you can probably separate those out to create multiple, contextually streamlined images.

Do:

Use an existing similar image or a more complex image when

that image is being built anyway;

and the extra time to wait for the required image does not slow down the overall build process.

Use a service with the simplest base image in all other cases.

Do not:

Use a more complex image as a build artifact.

Optimize dockerignore

This one can make a big difference: Add as much as possible to your .dockerignore.

Depending on your application, you probably want to ignore the .git, log, and tmpfolders at a minimum (although in some cases .git may need to be included). You can also update other parts of your pipeline to avoid dumping large binaries into directories that are not required in the build context. This will help keep your images from ballooning as an unintended result of build artifacts being erroneously added into the build environment.

One way to help figure out what to remove is to run your image while overriding the entrypoint to an interactive shell. This way, you can take a look at all the files that were added and note any that you don’t actually need.

You can poke around inside the running container to see if your image left behind any unnecessary files.

One great side effect of this optimization is that your image builds will be faster, since your image layers will have fewer files that could invalidate them.

Build versus bootstrap

A decision that will impact a lot of your other choices and have a big impact on your final results is whether to build assets into an image or to prepare them at runtime instead.

While you could compile assets at build time and include them in your images, you can also generate them at run time instead. You could even pull them from an external source where they were generated either during the build, during a previous build, or from an altogether out-of-band process. You can also add your assets to your image in a compressed format and uncompress them at runtime.

The right path for you depends on your architecture and your goals, but you should factor in how heavy or light you ultimately want your images to be.

Conclusion

As you can see, there are a lot of great ways to reduce your image size. Some of them go beyond mere efficiency and rely on decisions that will permeate the rest of your application architecture.

Just remember: Efficient images are ultimately a tradeoff of size balanced against what you need to reliably and easily support your application.