About docker

Docker is a management system/environment for using containers.
Containers are built on top of hosts. They share the same kernel and hardware controllers but might have a different linux flavour or set of libraries on top.

We set up container images that are like snapshots of the container we want - all the libraries, files etc. We then run the container to set up a temporary instance that contains all our working files. When we are done we stop the container and all data and any local changes are lost forever!!! To save the output of a container instance we must write the data back to the host or somewehere else that's permanent.

This youtube video is an excellent practical introduction to the world of containers. For this example I am assuming that you are working on a machine that already has docker installed.

Dockerfiles are used to build up an image. We start FROM a base image. Then we COPY files or RUN extra commands or set specific ENV variables. The Dockerfile lives in the top of the project and should be called Dockerfile with a capital D.

In this example, we are starting from the rocker/studio image. These are public (not official) but they are solid and very well supported. Rocker also have images for r-base (rocker/r-base) and a geospatial suite (rocker/rstudio-geospatial). This has all the basic spatial libraries (sp, sf) installed plus all the stuff you require outside of R to make them work (e.g. GDAL).

To install extra libraries we specify them in requirements.R. On build, this is copied onto the instance and run to install the libraries.

Finally the build copies our files over - the Analysis folder and the Data folder. We put these in the home directory of our user, called rstudio.

Build it

Type the following command into the command line. You must be in the same directory as your Dockerfile.

sudo docker build --rm --force-rm -t rstudio/hello-world .

the --rm --force-rm just forces the container to delete itself once its scripts run or you log out. It just stops us filling up the server with lots of containers doing nothing. Once this has built run

sudo docker image list

to see your image added to the list. We've called it rstudio/hello-world but you can call it anything.

Run it

We want to use this image to access rstudio so we want it running as a background service (i.e. in detacted mode) we use the flag -d to do this. If you want to access a bash shell or other interactive mode, you need to specify -it.

Rstudio runs on port 8787 within the container. We need to map this to an unused port on the host machine with a -p <host port>:<container port> We will use 28787, but this can be any used port.

We will call our container hello-world. This is the simple run command:

Run this command and access the container through your webbrowser at <yourhostip:28787>. Username and password are both rstudio.

In rstudio, type

source("Analysis/hello.world.R")

You will see that you can see the Analysis and Data folder but there are two problems.

In order to write to a file within Docker (through rstudio) you need to have the right userid. With these rocker images you can get that by specifying -e USERID=$UID in the run command. Then you can write and you can make changes to files and save them within the container.

It's all well and good to write to the local container but this data won't be permanent. We can write our output back to the host directory by mounting a host directory as a volume on the container with -v /full/path/to/dir . This is also useful in development as you can make changes in your permanent host folder which are then immediately available on the container without rebuilding it.

Before we fix the problem we need to stop the container that's running (it's no good for us):

sudo docker stop hello-world

Now lets try again. If you look in run_docker.sh you will see a better version and explanation. Basically its: