AWS EC2 Container Registry and aws‑ecr‑gc

When you visit 99designs.com, you’re interacting with a collection of web applications running as Docker containers on AWS EC2 machines. We’re continually improving those applications, deploying updates many times per day. Our Buildkite continuous integration (CI) setup automatically builds and tests each update as a new Docker image, pushing it to a private Docker registry and deploying it to our production servers. Development of 99designs also happens in Docker containers, using images pulled from a private Docker registry.

Having tried a few Docker registry providers, we’ve landed on Docker Hub run by Docker, Inc. But when AWS announced general availability of their own EC2 Container Registry (ECR) we were interested in the idea of having our Docker registry running on the same provider as the rest of our infrastructure so that;

The above are image names (untagged and tagged) on Amazon ECR. They’re fine for automated process, but unwieldy for humans in development environments.

Authentication
—

Authentication to private Docker registries is normally done with docker login which writes the credentials to ~/.docker/config.json, where they’re used for subsequent push/pull operations for that registry.

As a bridge between this mechanism and AWS IAM, the AWS Command Line Interface has an aws ecr get-login command which, assuming the requesting AWS user/role has the correct access, returns a ready-to-run docker login … command with generated credentials built in.

The generated credentials expire in twelve hours, after which new credentials must be requested. As with the complex image names, this is fine for automated processes but unwieldy for development environments.

Image storage limits
—

It’s common to continually push new images with new tags to a Docker repository, e.g. build-20170303-152000, build-20170303-153100, etc. Even continually pushing to a single latest tag may lead to unbounded storage of untagged images.

Docker Hub seems to brush this under the carpet, presumably wearing the cost for now. AWS ECR, however, defaults to a limit of 1,000 images per repository. It’s possible to request a limit increase, but this highlights the reality that image storage needs to be accounted for eventually.

Introducing aws-ecr-gc
—

Our solution to staying under the ECR image limit while keeping a healthy number of previous image tags is aws-ecr-gc. It assumes that related tags in a repository will have a common prefix. For example a CI repository may contain build-a92d, build-71ba, build-321d as well as release-latest, release-previous, release-a92d, release-71ba, etc.

Given a list of tag prefixes e.g. build and release, aws-ecr-gc deletes all but the newest N images matching those prefixes. Images with tags not matching the listed prefixes are not deleted. Optionally, untagged images are also deleted.
Example; delete all untagged images, delete all but the latest 4 images with tags starting with release-production, and delete all but the latest 8 images with tags starting with build:

Conclusion
—

AWS ECR is great for automated build and deploy processes, but less convenient for people working with the Docker images. So we’ve moved our CI and deployment processes from Docker Hub to ECR, but left our developer-facing Docker images on Docker Hub for simpler authentication and image naming.

Today we’re releasing aws-ecr-gc under the MIT open source license. Adding it as a CI build step cleans up old images while keeping some recent releases in case rollback or debugging are required.