Using Cloudant from Jupyter Notebooks

Jupyter notebooks are a popular way of exploring data sets by setting out your code, data and visualisations in an interactive, web-based notebook. Jupyter notebooks can be run on your own machine, or as-a-service as is the case with IBM Watson Studio. Oftentimes your data is in CSV format and loaded into a data frame for analysis using Apache Spark or Pandas, but it is also possible to load data directly from a Cloudant database directly from the notebook. In this article I’ll demonstrate how this is done in Python and Node.js.

Not paper notebooks, Jupyter notebooks

Sprinkling some Pixiedust

We’re going to use the open-source Pixiedust library along the way. Pixiedust providers helper functions that allow data to be visualised with very little effort in a notebook. Throw it a Spark or Pandas dataframe and Pixiedust will do the rest, whether you need a table, chart or map.

Setting up Watson Studio

Watson Studio allows Jupyter notebooks to be run as-a-service in the IBM Cloud. The notebooks can be backed by a choice of kernels: choose your Python version, number of CPU cores and memory allocation. The notebooks can be paired with other services in the IBM Cloud such as Apache Spark, IBM Cloudant and IBM Cloud Object Storage to create analytic workspaces and interactive dashboards to draw insights into your data.

In the project create a new notebook — choose Python 2.7 & the 1 CPU / 4GB of RAM option to try the service for free

Setting up Cloudant

To allow access to our Cloudant data from a notebook, we could use the Cloudant account’s admin credentials, but better practice is to create and api-key/password pair that has read access to the database(s) needed.

In the Cloudant dashboard, select the database to be accessed and choose the “Permissions” tab. Click the “Generate API Key” button:

Make a note of the key and password. The new key is automatically given _reader access to your database and it can be given further permissions by checking boxes against the key.

Python

We’re going to use the official Cloudant libraries in our notebooks: first up, the Python library.

The first time only, we’re going to need to install the library with a shell command in a notebook cell:

In the next cell, you can then import the library and make a connection to the Cloudant database:

supplying the apikey/password pair we generated earlier and a hostname — or more precisely, the bit of your Cloudant’ accounts domain name before .cloudant.com.

We can use the Cloudant client object to connect to a specific database:

and use that object to fetch some data — in this case the first 500 documents:

The documents array can become a Pandas data frame,

and the data frame used in a Pixiedust visualization:

The “display” function gives you access to tons of visualization options include maps from Google and MapBox.

Using Node.js in notebooks

The pixiedust_node project allows you to mix-n-match Node.js code in notebooks together with Python code. Patrick Titzler wrote an excellent guide to running Node.js in Watson Studio notebooks — on a local installation of Jupyter, the Node.js executable simply needs to be available on your machine’s path.

Once your Jupyter environment is configured we can install pixiedust_node

and install the official Node.js Cloudant library:

At this point, any cells starting with %%node are interpreted as Node.js code. Firstly, we need to initialise our Cloudant connection:

We can use the db object to fetch some data and turn the data into an array of documents for display:

A cool feature of pixiedust_node is that global Node.js variables are automatically copied into the Python. So the docs array is immediately available to work with in Python in the next cell:

Further reading

There’s much more to be done with Notebooks & Cloudant. Here’s some more reading material: