As I keep noting, user namespaces when available in Docker, should be able to transparently hide any underlying mapping to a special user ID as required by an underlying platform, allowing the Docker container to use what ever user ID it wants. We aren’t there yet, and given that user namespaces were first talked about as coming soon well over a year ago, we could well be waiting some time yet for all the necessary pieces to fall into place to enable that.

In the mean time, the best thing you can do to ensure Docker images are portable to different hosting environments, and be as secure as possible, is design your Docker containers to run as a non ‘root’ user, but at the same time be tolerant of running as an arbitrary user ID specified at the time the Docker container is started.

File system access permissions

In our prior post, where we got to was that when running our IPython Docker container as a random user ID, it would fail even when running some basics checks.

The problems basically boiled down to file system access permissions, this being caused by the fact that we were running as a different user ID to what we expected.

The first specific problem was that the ‘HOME’ directory environment variable wasn’t set to what was expected for the user we anticipated everything to run as. This meant that instead of the home directory ‘/home/ipython’ being used, it was trying to use ‘/‘ as the home directory.

As a first step, lets simply try overriding the ‘HOME’ directory and forcing it to be what we desired it to be by adding to the ‘Dockerfile':

The ‘HOME’ directory environment variable is now correct, but we still cannot create files due to the fact that the home directory is owned by the ‘ipython’ user and we are running with a different user ID.

Using group access permissions

The solution to file system access permission problems one often sees in Docker containers which try to run as a non ‘root’ user is to simply make files and directories world writable. That is, after setting up everything in the ‘Dockerfile’ as the ‘root’ user and before switching the user using a ‘USER’ statement, the ‘chmod’ command is run recursively on any directories and files which the running application might need to update.

I personally don’t like this approach of making everything world writable at all. To me it falls into that category of bad practices you wouldn’t use if you were installing an application direct to a host when you aren’t using Docker, so why start now. But what are the alternatives?

The more secure alternative that would normally be used to allow multiple users to update the same directories or files are UNIX groups. The big question is whether they are going to be useful in this case or not.

As it is, when the home directory for the ‘ipython’ user was created, the directories and files were created with the group ‘ipython’, being a personal group created for the ‘ipython’ user when the ‘adduser’ command was used to create the user account.

The problem with the use of a personal group as the primary group for the user and thus the directories and files created, is that it is impossible to know what the random user ID will be and so add it into the personal group in advance. Having the group of the directories and files be a personal group is therefore not going to work.

The question now is if the group would normally be set to whatever the primary group is for a named user, what group is actually going to be used when the user ID is being overridden for the container at run time.

Lets first look at the case of where we override the user ID but still use one which does have a user defined for it.

Here we specify the user ID ‘5’, which corresponds to the ‘games’ user. That user happens to have a corresponding primary group which maps to its own personal group of ‘games’. In overriding the user ID, the primary group for the user is still picked and used as the effective group. Thus the ‘id’ command shows the ‘gid’ being ’60’, corresponding to the ‘games’ group.

Do note that this is only the case where only the user ID was overridden. It so happens that the ‘-u’ option to ‘docker run’ can also be used to override the effective group used as well.

That is, the effective group is set as the ‘gid’ of ‘0’, corresponding to the group for ‘root’.

The end result is that provided that we do not override the effective group as well using the ‘-u’ option, if the user ID specified corresponds to a user account, then the primary group for that user would be used. If instead a random user ID were used for which there did not exist a corresponding user account, then the effective group would be that for the ‘gid’ of ‘0’, which is reserved for the ‘root’ user group.

Note that in a hosting service which is effectively using a randomly assigned user ID, it is assumed that it will never select one which overlaps with an existing user ID. This can’t be completely guaranteed, although so long as a hosting service uses user IDs starting at a very large number, it is a good bet it will not clash with an existing user. For OpenShift at least, it appears to allocate user IDs starting somewhere above ‘1000000000’.

As to overriding the group as well as the user ID, it is also assumed that a hosting service would not do that. Again, OpenShift at least doesn’t override the group and this is probably the most sensible thing that could be done here as overriding of the group to be some random ID as well, would make the use of UNIX groups inside of the container impossible as nothing would be predictable. In this case I would suggest any hosting service going down this path of allocating user IDs, follow OpenShift’s lead and not override the group ID as doing so would likely just cause a world of hurt.

Using a user with effective GID of 0

What now is going to be the most workable solution if we wish to rely on group access permissions?

In light of the above observed behaviour what seems might work is to have the special user we created, and which would be the default user specified by the ‘USER’ statement of the ‘Dockerfile', have a primary group with ‘gid’ of ‘0’. That is, we match what would be the primary group used if a random user ID had been used which does not correspond to a user account.

By making such a choice for the effective group, it means that the group will be the same for both cases and we can now set up file system permissions correspondingly.

Unlike before when overriding with a random user ID with no corresponding user account, the attempts to create files in the file system now works okay.

What you will note though is that the file created is in this case owned by user with user ID of ‘10000’. This worked because the effective group of the random user ID was ‘root’, matching what the directory used, along with the fact that the group permissions of the directory allowed updates by anyone in the same group. Thus it didn’t matter that the user ID was different to the owner of the group.

One thing you may note is that when the file ‘magic’ was created, the resulting file wasn’t itself writable to the group. This was the case as the default ‘umask’ setup by Docker when a container is run is ‘0022’. This particular ‘umask’ disables the setting of the ‘w’ flag on the group.

Even though this is the case, this is not a problem because from this point on any code that would run, such as the actual Jupyter Notebook application, would only ever run as the same allocated user ID. There is therefore no expectation of any processes running as the original ‘ipython’ user needing to be able to update the file.

In other words, that directories and files are fixed up to be writable to group only matters for the original directories and files created as part of the Docker build as the ‘ipython’ user. What happens after that and what the ‘umask’ may be is not important.

One final check to go, will this updated version of the ‘jupyter/notebook’ Docker image work on OpenShift, and the answer is that it does indeed now start up okay and does not error out due to the problems with access permissions we had before.

If we access the running container on OpenShift we can perform the same checks as above okay.

Named user vs numeric user ID

Before we go on to further verify whether the updated Docker image does in fact work properly on OpenShift, I want to revisit the use of the ‘USER’ statement in the ‘Dockerfile’.

Right now the ‘USER’ statement is specifying a default user. This user would be used if you were running the Docker image directly with Docker yourself. As we have seen, if used with OpenShift, the user given by the ‘USER’ statement is actually ignored.

The reasons that a hosting service such as OpenShift ignores the user specified by the ‘USER’ statement is that it cannot trust that the user is a non ‘root’ user when the user is specified by way of a name. But also because where a host service provides an ability to mount shared persistent volumes into containers it may want to ensure running containers owned by a specific service account, or a project within a service account, have different user IDs as part of ensuring that there is no way an application could see any data stored on a shared volume created by a different user, if a volume was mounted against the wrong container.

Now one of the possibilities I did describe in a prior post was that if a hosting service only supported 12 factor applications and didn’t support persistent data volumes, although it should really still prohibit running a container as ‘root’, it may allow a container to run as the user specified by the ‘USER’ statement so long as it knows it isn’t ‘root’. This though it can only know if a numeric user ID was defined with the ‘USER’ statement.

To cater for the possibility, rather than use a user name with the ‘USER’ statement, lets use its numeric user ID instead.

Now from the above tests we saw that the numeric user ID for the user ‘ipython’ created by ‘adduser’ was ‘1000’. We could therefore use it with the ‘USER’ statement, however, since what ‘adduser’ will use for the user ID is not technically deterministic, as it can be dependent on what other user accounts may already have been created, but also can depend on what operating system is used, we are better off being explicit and telling ‘adduser’ what user ID to use.

What exactly the lowest recommended user ID is for normal user accounts looks to be 500 on Posix and Red Hat systems, and 1000 on OpenSuSE and Debian. Lets therefore go with a number 1000 or above, but just in case an operating system image may include at least a default user account, lets skip 1000 and use 1001 instead.

All up this should give us a the most portable solution. Working where the Docker container is hosted directly on Docker, but also working on a hosting service such as OpenShift, which uses Docker under the covers, but which overrides the user ID containers run as. Using a numeric user ID for ‘USER’ also allows a hosting service to still used our preferred user if it does not want to allow containers to run as ‘root’, as will know it can trust that it will run as the user ID indicated.

Cannot find name for user ID

It would be great to say at this point that we are done and everything works fine. That is however not the case as I will go into in the next post.

The remaining problem relates to what happens when we run the ‘whoami’ command: