Best practices for data science with the Jupyter Notebook

I recently listened to a really interesting talk by Jonathan Whitmore where he discussed the approach his company has to working with data using the Jupyter Notebook. I’d recommend watching it, but I’ve made a brief summary below for my own future reference.

Notebook Types

Use Jupyter notebooks to collaborate and share data/analyses amongst teams and clients. Utilise two types of notebooks:

1. Lab Notebooks

Use to keep a record of things like exploratory analysis etc

You don’t change/update it – its a historical record for each person

Naming notebooks

[Date]-[Initials]-[2-4 word description].ipynb

2016-12-18-JS-iris-dataset-exploration.ipynb

Split when notebook reaches a certain size or by topic

Example notebook:

Title: purpose of notebook

What is in the notebook

What you were trying to achieve/analyse/hypotheses

Can say whether different analyses worked or were a dead end

Import libraries, use magics

Use version_information package to output version numbers of libraries

Import data

Can link back to deliverable notebooks you’ve built on – e.g. notebook explaining how data was cleaned

2. Deliverable Notebooks

Notebooks you’ll want to reference in the future

Processing and cleaning raw data: record of transformation

Use as evidence of analysis when making pull requests

Used and shared by entire team

Directory Organisation

data [backed up outside of version control]

deliver [notebooks to deliver/continually use]

develop [lab notebooks]

figures [where your figures are stored]

src [scripts/modules]

Teamwork/Version Control Recommendations

Each data scientists has dev branch that they push to daily

Merge to master via pull request

Commit .ipynb, .py and .html and figures from notebook (saving 3-4 different ways)

Benefits

Record complete analysis, including dead ends so its easy for others to review

Computational biology PhD candidate at the Australian National University. I love writing (both articles and software), learning more about the world around us, and beekeeping. I also write for BioSky.co