Docker Healthchecks: Why Not To Use `curl` or `iwr`

Healthchecks are an important feature in Docker. They let you tell the platform how to test that your application is healthy, and the instructions for doing that are captured as part of your application package.

When Docker starts a container, it monitors the process that the container runs. If the process ends, the container exits. That's just a basic liveness check, because Docker doesn't know or care what your app is actually doing.

The container process could be running, but it could be maxed out - so a web process might respond 503 to every request, but it's still running so the container stays up.

A healthcheck is how you tell Docker to test your app is really healthy, so if your web process is maxing out, Docker can mark the container as unhealthy and take evasive action (in swarm mode Docker replaces unhealthy containers by spinning up replacements).

Sounds good, let's do it with curl

The healthcheck is captured in the image with a HEALTHCHECK instruction in the Dockerfile. There are some greatblogposts on using healthchecks, and the typical example looks like this:

HEALTHCHECK CMD curl --fail http://localhost || exit 1

That uses the curl command to make an HTTP request inside the container, which checks that the web app in the container does respond. It exits with a 0 if the response is good, or a 1 if not - which tells Docker the container is unhealthy.

Neither of those options is great. Instead you should look at writing a custom healthchecking app.

The problem with curl and iwr

The curl/iwr option is nice and simple, but it has some pretty significant drawbacks when you're working on a production-grade Docker image.

In Linux images, you need to have curl available. You can start FROM alpine and have a 4MB base image. That doesn't have curl installed, and as soon as you RUN apk --update --no-cache add curl you add 2.5MB to the image. And all the attack surface of curl.

In Windows images, you need to have PowerShell installed. The latest Nano Server images lose PowerShell in favour of image size and attack surface, and it would be a shame to lose that just to get iwr.

If you rely on a specific tool, your Dockerfile becomes less portable. If your apps are cross-platform and you use multi-arch images, a healthcheck that relies on an OS-specific tool breaks your cross-platformness. Best case - your image fails to build. Worst case - the image builds, but it has a healthcheck that always fails on one platform (because it's trying to use curl on Windows or vice versa).

There's a limit to what you can do with a simple HTTP tool. To flex your app and prove key features work, you can end up writing a /diagnostics endpoint which you curl. Diagnostics endpoints are a good thing to have, but you need to make sure that endpoint stays private.

By using an external tool to power your healthcheck, you take on the cost of installing that tool in your image, and maintaining that tool - suddenly you need to patch your app image if the healthcheck tool gets an update.

Instead you should think about writing your own healthcheck app, using the same application runtime as your own app.

Writing a custom healthchecker

The custom healthcheck app gets over all the issues of using an external tool:

you're using the same runtime as your actual app, so there are no additional prerequisites for your healthcheck

if your app runtime is cross-platform, so is your healthcheck

you can put whatever logic you want into your healthcheck and it can stay private, so only the Docker platform can execute that code.

The downside is that you now have a separate thing to write, maintain and package alongside your app. But it will be a thing written in the same language, and it should be simpler than crafting a complex curl statement.

Sample healthcheck in Node.js

This blog runs in Ghost with an Nginx front end, on a Docker swarm running in Azure. Ghost is a Node.js app - and the healthcheck for the blog containers uses a very simple script, healthcheck.js:

There's not a huge amount of code here, but I have a lot of control over how the check runs. I set a timeout for the request call, I check the HTTP status code of the response, and I write log entries on success or failure (which get recorded by Docker and you can see them in docker container inspect).