How we served 20k IPython notebooks for Nature readers

The IPython/Jupyter notebook is a wonderful environment for computations, prose, plots, and interactive widgets that you can share with collaborators. People use the notebook allovertheplace across many varied languages. It gets used by data scientists, researchers, analysts, developers, and people in between.

How does this temporary notebook service work?

tmpnb is a service that spawns new notebook servers, backed by Docker, for each user. Everyone gets their own sandbox to play in, assigned a unique path.

When a user visits a tmpnb, they’re actually hitting an http proxy which routes initial traffic to tmpnb’s orchestrator. From here a new user container is set up and a new route (e.g. /user/fX104pghHEha/tree) is assigned on the proxy.

Planning the Notebook Demo

When Brian Granger (Cal Poly) and Richard Van Noorden (Nature) asked for a demo, it was quite open what that could mean. Do we have people log in to a JupyterHub installation? Refer them to Wakari or Sage Math Cloud?

The goal that Richard stated was to provide at most 150 concurrent users. In the back of our minds, we (the IPython/Jupyter project) knew that the initial spike in traffic would be far greater and we should be able to handle the load.

You can tell this is a community with a lot of passion and aligned around a common format that has helped propel their research.

The other crazy benefit about being in London was that I got to go to the Nature offices to talk about the architecture backing the demo and make plans for operations. Chris Ryan, art editor at Nature, would put an iframe as part of the article in, expanding it into a lightbox for users when they click. For us, this just meant providing them one URL to rely on for content to get served to (as well as adjusting CSP or X-Frame restrictions).

Kicking the Nature Notebooks into Operation

On the day of launch, we watched as the notebooks started getting gobbled up and recycled.

After some smooth sailing, we watched as it ticked toward our 512 user mark. After reading comments on various social media sites, we decided to kick it up a notch and allow for 1000s of concurrent users while the demo had initially launched.

This bit us in a couple ways. In order to scale across hosts we’d need to put the proxy and tmpnb in front of multiple docker hosts (note: this is pre-docker swarm). Trying to swap largely untested bits out from underneath in production, while also dealing with the proxy issues did not sound ideal. Instead, Min RK quickly whipped up the tmpnb-redirector which uses the /stats endpoint to redirect users to new servers. This made rotating old nodes out easy as well.

Closing up

In the end we ended up serving more than 20,000 notebook servers and counting.

We love IPython notebooks, the overall architecture that has been built out here, and hope to keep supporting Open Source projects do interesting things on the internet in a way that benefits community, technology, and the whole ecosystem.