Diffs are hard to read

As you can see in the above picture, diffs are pretty hard to read (GitHub PR link). Images can’t be compared, code is inside JSON arrays, there are a lot of irrelevant metadata changes, and so on. It’s impossible to review pull requests with this kind of diff.

Tools like nbdime were created to see diff in more human-friendly way.

But nbdime is only useful locally, and doesn’t work with version control systems like GitHub/GitLab that people actually use. We need a way to look at proper diffs in GitHub Pull Requests, and be able to comment and review.

Merge is even harder

Let’s say you somehow manage to do without diff.

Now your teammate has pushed changes to a notebook and you wish to pull those in and merge with your local changes.

Good luck resolving the merge conflict in any text-based editor. You have to take care of JSON format integrity, image binary strings, numerous metadata changes, and so on.

Users typically fall back on –theirs/–ours git strategy, since the effort to actually merge is too high. You can also setup nbmerge as a git driver to manually resolve merge conflicts inside the Jupyter UI. This is definitely better than mucking around the JSON in a text editor.

Reproducibility

Notebook code (or any code for that matter) requires a certain environment to work correctly. This includes other packages, environment variables, and data sources.

Jupyter, by design, doesn’t capture the environment information anywhere. Given just Notebook files, it’s not always easy or possible to reproduce the result.

You provide it a GitHub URL, it looks for requirements.txt or environment.yml file, builds a Docker image, and spins up a JupyterHub server with your notebooks in it. The project is still in beta, but might be your best bet for reproducible notebook environments at the moment.

Conclusion

In all fairness, Jupyter was designed for individual use. But given its simplicity and popularity, it’s getting adopted by teams for sharing/collaboration workflows as well.

Certain design choices, probably made at its inception, don’t fit well with the requirements of a team workflow. I believe the community recognises the shortcomings, and we’ll see a lot of complementary tooling in the form of JupyterLab extensions.

That’s all! I’m also working on a tool that solves the version control problem and makes it easy to use notebooks with GitHub (and likely with GitLab later). If you’re interested in using it or have a feature requests, drop me a note at [email protected].

Our mission: to help people learn to code for free. We accomplish this by creating thousands of
videos, articles, and interactive coding lessons - all freely available to the public. We also have
thousands of freeCodeCamp study groups around the world.