APIs are a convenient way to access data across devices independently of a programming language. REpresentational State Transfer APIs are the most common type of Web API. REST is a software architecture paradigm which defines a set of uniform and stateless operations. The uniform operations make it simple to define an interface and the statelessness makes it reliable, fast and easy to modify and scale. The commonly used exchange protocol is the HTT Protocol, with its operations GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE send to an IP address or its associated URL.

We will build a REST API in R with the Plumber package to enable easy access to data from the public Big Query Real-time Air Quality data set from openAQ. We enable to hold the permanent records of air quality through a scheduled data extraction via Cron in a Docker container as older entries of the data set get omitted when new data is added. The R Plumber API will be run in its own Docker container and this container will be run in a container network together with the data extraction Docker container. This architecture has several advantages:

Portability of the whole service to other machines or cloud services

Clearly defined dependencies avoid breakage of functionality

Extracting and pre-aggregating the data enables fast API response times without the need for data base querying

Enhanced data security as the API operations and the data base access are in separate containers

Modularity enables easier debugging of the service and integration of additional parts

R Plumber script

Plumber allows you to decorate R code with comments that define endpoints and various input parameters. You can then expose the decorated R code at an defined IP address. Install Plumber from CRAN or GitHub and open a new Plumber script to see some examples. If you have RStudio, you can click the “Run API” button to test your endpoints locally with Swagger. Per default, Plumber output is send as JSON, however you can use other serializers or create new ones to instruct plumber to render the output in a different format. For more information, see the Plumber documentation.

The following script instructs plumber to expose various R functions to get data from the extracted air quality data which is saved in the shared volume of the two Docker containers as /shared-data/airquality-india.RDS . The last function at endpoint /plot will get you a test histogram when called in PNG format as specified by serializer #* @png . Note that instead of the elseif statements you could parameterize the function to get a more concise code. You can clone the whole project GitHub repo here.

In the directory of the Dockerfile run docker build -t openaq_api ., this will build the image from the Dockerfile and tag it as openaq_api. To test the dockerized API run the docker container via this command to bind the host port 3838 to the exposed container port at which the API runs.

Creating the multi container service

We define a service consisting of the API container and the data extraction container with a shared volume between them via docker-compose. Docker-compose is a tool you can install additionally to the Docker engine which makes it easy to set up a multi-container service programmatically through definitions in a YAML file. We define the shared volume via parameter volumes: and a shared network to enable the containers to listen to each others ports via parameter networks: (This is not necessary in this service and just shown for clompleteness). The containers are defined through parameter services:, here the build: parameter specifies that the container images are rebuild from the Dockerfiles in context:. The shared volume is mounted to a directory inside the containers in volumes:. The exposed port 3838 of the API container is bound to port 3838 of the host via ports:.

If you cloned the project GitHub repo, you can see the file structure with the docker-compose.yml file in the top directory. In the top directory build and start the containers with command

$ docker-compose up

To run in detached mode add -d. To force the recreation of existing containers and/or force the images to rebuild add –force-recreate –build . To stop the all the started networks and containers specified in the YAML file just run docker-compose down.

The extraction process should now be up and running as seen in the docker logs because we tailed the logs of the scheduled cron job. When the first extraction run finished you can use the Plumber API to receive the data in R:

Where to go from here: Concluding remarks and additional notes

That‘s it, we build a robust service for extracting data from Google Big Query and made the data easily accessible through a REST API with Docker and R in this three article series.

Originally, I mounted in the docker-compose.yml for the API container the docker UNIX socket of the host Docker daemon as a volume -/var/run/docker.sock:/var/run/docker.sock to be able to get the docker logs from the host via an API call. However, I removed this part as this practice is a huge security issue, especially if the containers are used in production. See https://raesene.github.io/blog/2016/03/06/The-Dangers-Of-Docker.sock/ for more information.

From here on, you could deploy this multi container service into production, for example to cloud services such as AWS, Google Cloud and DigitalOcean. It is useful to have a container orchestration tool deployed such as Docker Swarm or Kubernetes to manage your Docker containers and their shared resources.

In a production setting you might want to use a reverse proxy server, such as Nginx to redirect the API requests to an URL further to the exposed port of your API Docker container and encrypt it via HTTPS. Additionally you might want to write unit tests for your API with R package testthat and also load testing your API while under many requests with e.g. the R package loadtest.

Plumber handles API requests sequentially. If you experience a lot of API calls, one option would be to deploy several of the API containers and load balance the incoming traffic to them via Nginx. If you want to run four of the API containers, run docker-compose up with scale parameter: