Deploying JupyterHub on Kubernetes on Google Cloud

JupyterHub, a "multi-user server for Jupyter Notebooks," is an essential tool for teaching and training at scale with Jupyter. As described in The course of the future – and the technology behind it
, JupyterHub is being used to power an introductory class in data science taken by hundreds of students at Berkeley every semester.

JupyterHub is a complex piece of software, and setting up and operating has been out of reach for many organizations, but recent work by members of the Jupyter team - especially @CarolWilling, @choldgraf, @Mbussonn, @minrk, and @yuvipanda -- has put JupyterHub within reach of a host organizations and individuals.

Their new project, a Helm package for JupyterHub and an accompanying article called Zero to JupyterHub on how to use it, describes the relatively straightforward steps needed to install and run JupyterHub on Google cloud.

In this article, I've followed along with the tutorial, adding additional detail on setting up gcloud, preparing a docker image with the content project you want to deploy in it, and provided more background on some of the tools used.

Introduction

Although there are a lot of steps, there are three core things to do:

Create a k8s cluster on gcloud. In a traditional ops setting, this is kind of like setting up a new server.

Install the JupyterHub application on the cluster using Helm, the k8s package manager.

Configure the new JupyterHub instance to serve a default content project. In this case, I'll have it serve Allen Downey's ThinkDSP project. See Computational Publishing with Jupyter for more background information on this step.

By the end of this tutorial, you'll have a public server (no DNS entry or https, which is something I'll need to figure out how to add) where anyone can log in and get a running instance of ThinkDSP.

What's not covered here:

How to set up an "authenticator" for JupyterHub so that you can control who can log into Jupyter and get a notebook. Right now, anyone can just log in with any username and password. Probably unwise.

You'll need to keep this file around, so be sure to commit it to a GitHub repo. [THIS IS PROBABLY NOT GREAT ADVICE SINCE IT CONTAINS SECRETS, BUT WHAT IS THE BEST WAY TO DO IT?]

Install JupyterHub with Helm

Now that we have helm, we can (finally!) use helm install to put the JupyterHub app on the cluster. We'll use the config file we created in the previous step, and use the name jupyterhub-test as the name and namespace of the application (this is how Helm keeps up with the apps running on the cluster).

This will run for a while. When it finishes, it will produce some helpful log data, as well as the release notes for the JupyterHub app:

NAME: jupyterhub-test
LAST DEPLOYED: Fri Jun 2 10:29:44 2017
NAMESPACE: jupyterhub-test
STATUS: DEPLOYED
RESOURCES:
==> v1/PersistentVolumeClaim
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
hub-db-dir Pending hub-storage-jupyterhub-test 1s
==> v1/Service
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hub 10.11.246.154 <none> 8081/TCP 1s
proxy-api 10.11.240.251 <none> 8001/TCP 1s
proxy-public 10.11.254.221 <pending> 80:30746/TCP 1s
==> v1/Secret
NAME TYPE DATA AGE
hub-secret Opaque 2 1s
==> v1/ConfigMap
NAME DATA AGE
hub-config-1 14 1s
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hub-deployment 1 1 1 0 1s
proxy-deployment 1 1 1 0 1s
==> v1beta1/StorageClass
NAME TYPE
single-user-storage-jupyterhub-test kubernetes.io/gce-pd
hub-storage-jupyterhub-test kubernetes.io/gce-pd
NOTES:
Thank you for installing JupyterHub!
Your release is named jupyterhub-test and installed into the namespace jupyterhub-test.
You can find if the hub and proxy is ready by doing:
kubectl --namespace=jupyterhub-test get pod
and watching for both those pods to be in status 'Ready'.
You can find the public IP of the JupyterHub by doing:
kubectl --namespace=jupyterhub-test get svc proxy-public
It might take a few minutes for it to appear!
Note that this is still an alpha release! If you have questions, feel free to
1. Come chat with us at https://gitter.im/jupyterhub/jupyterhub
2. File issues at https://github.com/jupyterhub/helm-chart/issues

As you can see in the release notes from the log, it will take a while to for the app to initialize. Here's the instruction you can run to monitor its progress:

Then you can open you browser to http://104.155.179.31 and boom!, Notebooks:

Note that JupyterHub is running with a default dummy authenticator, so you can just enter any username and password. See extending jupyterhub for details on how to set up authentication, which I won't cover here.

Prepare Default Notebook to run on JupyterHub

By default, JupyterHub just gives you a blank Notebook. However, if you're teaching a class and you want to give your students access to something you've already created, you need to prepare a docker image that will be served by default.

To make a Docker image you can deploy onto JupyterHub, you need to ADD the repo to the /home/jovyan directory, and then set the WORKDIR to /home/jovyan.

If you're using Launchbot or otherwise have an existing Dockerfile, you can create a new Dockerfile and call it Dockerfile.prod. For example:

Once it's done building, you should be able to create a new notebook based on the base image.

Note that for now JupyterHub doesn't support persistent storage with the jupyter-stack images, but they're working on it.

Delete the cluster

Once you're done, delete your cluster in order to stop further billing!

gcloud container clusters delete notebook-test --zone=us-central1-b

Conclusion

Clearly, this is still a pretty technical process. However, by combining the ease of use of Helm with the cost-effectiveness and scalability of kubernetes on gcloud, running a state-of-the-art JupyterHub deployment is within reach of most small organizations, or even an individual.

By removing the pain of installing and operating JupyterHub, this project opens the doors to the classroom of the future to everyone.

Andrew Odewahn is the CTO of O'Reilly Media and the co-chair of JupyterCon. He's into developer education, open source books, Jupyter, Docker, Go, React, and just generally lowering the barriers to entry on technology.