Jupyter on the Cluster

Running Jupyter on your local machine is straight forward, but sometimes you need more computational resources which might mean hosting your work on a remote computer. This article will show you how to run Jupyter on a remote machine through an ssh tunnel such that you can interact with it in your local web browser.

On the Head Node

The first option is running Jupyter on the head node. In order to do this, we first need to log on to the relevant head node (tigercpu in this case):

Note that we selected the Linux port 8889 to connect to the notebook. If you don’t specify the port, it will default to port 8888. But sometimes this port can be already in use either on the remote machine or the local one (your laptop), if the port you selected is unavailable, you will get an error message, in which case you should just pick another. It’s best to keep it > 1024, I usually start with 8888 and increment by 1 if it fails e.g. try 8888, 8889, 8890 … . In the remaining of this post we assume that you picked the port 8889, if you are running on a different port, just substitute 8889 by your port number.

-N Do not execute a remote command. This is useful for just for‐
warding ports.
-f Requests ssh to go to background just before command execution.
This is useful if ssh is going to ask for passwords or
passphrases, but the user wants it in the background.
-L Specifies that the given port on the local (client) host is to be
forwarded to the given host and port on the remote side.

As the -f flag implies, the ssh tunnel will be running in the background. In order to kill the ssh tunnel, type lsof -i tcp:8889 to get the process id (PID) and use kill -9 <PID> to kill it.

NOTE: If prompted for a “Password or token” the first time you connect, it can be found on the head node where Jupyter is running, as shown in the figure below.

On a Compute Node via salloc

If you need to compute larger tasks, you should not be running it on the head node, but rather on one of the compute nodes. One way of doing that is to request an interactive session using salloc. Once a compute node has been allocated, we can run Jupyter and connect to it similarly to what we did in the previous section. One difference is that the compute nodes are not connected to the internet and we therefore have to slightly modify the local port forwarding.

First, from the head node, we ask for an interactive session with a compute node on the cluster. Here we are asking for 1 node, 1 core, for 5 minutes:

salloc -N 1 -n 1 -t 00:05:00

Once the node has been allocated, type hostname to get the name of the node. In the figure below, we have been assigned tiger-h26c2n22.

On that node, we first need to unset the XDG_RUNTIME_DIR environment variable to avoid a permission issue, then we launch Jupyter:

During the Help Session today, someone was having issues binding to 8889 which was being used by some process on his macOS system. Killing the process that was using the port and then starting the tunnel seemed to break the authentication token, but restarting the process with port 8890 worked. I suspect that a comment from https://github.com/jupyter/notebook/issues/3495 might explain that case:

“You may have two different notebook servers running, one on port 8888 and one on port 8889. If the first one was started before you set the password, it won’t accept the new password.”

But it might be good to include some more discussion of TCP port usage in this article.