Before we get started here, I must confess something, I know very little about package verification, cryptography or security. This is really a brain dump of everything I have learned from browsing the internet over the last few months.

The first thing to think about when considering packaging verification is not the technical implementation of anything, it's what are we trying to achieve by using crypto to do anything?

Is the package I am installing the one the package maintainer uploaded?¶

This I think is the question, I am the maintainer of the sunpy package, and if you are installing SunPy and you are security conscious you probably want to be able to check that the thing you are installing on your machine (potentially with root access) is indeed the code I uploaded. This presents an inherit trust in me the maintainer that I am not deliberately going to mess with your computer, but this is probably as good as you are going to get.

So taking this a little further, what does trusting me actually mean? When using PGP (we will get on to this later) people often talk about a web of trust, in that you trust Bob who trusts me so then you can trust me. This model has many flaws, mainly that not enough people use PGP to make it work. It also misses the point, what you are actually trusting is the git repo I am building the source tarball from. You are trusting that I trust the source I am releasing in the first place.

This I think is the core point I am making here, what you are trusting is effectively my GitHub account. Someone with my GitHub account could make changes to the SunPy repo and do something malicious that I may not notice when I do a release. (You are also trust all the other people with merge rights, but one thing at a time.)

So you can know that the person who has control over the PGP key with the fingerprint 60BC5C03E6276769 has access to my GitHub account, and therefore can change the source code you are installing. I also sign my commits with the same PGP key, so you know that they all came from the person with control of that key, and that that person has control of the Cadair account.

I think this can be summarised as: "You trust me to write your software, so you trust me to run it on your computer".
Which in terms of PGP equates to: "This key signed these commits and is linked to this GitHub account, I therefore trust a package made with it."

This is subtly but importantly different from the normal trust model of PGP where you are trying to verify that they key belongs to and is controlled by me the real person Stuart Mumford. In this model knowing that it's the same person who has control of the git repo and GitHub account is sufficient.

This is the second stage. We have determined that we are happy to trust a key that we can associate with a GitHub account and some of the commits in a repo, so how do we use this to verify that the same key is the one that signed the package we are about to install?

To verify that the package you had downloaded was indeed signed by the correct pgp key. Which is silly. There is a little work going on to make this easier e.g. pypa/pip#1035. Various efforts like this seem to be stalling over worries about the trust model and lulling users into a false sense of security.

Recently I started using the excellent conda forge project to start building conda binaries for SunPy. This makes it much easier for people to install and use SunPy, but now you not only have to trust the source file, but you have to trust the binary built package as well.

What does trusting the binary look like? Do you trust that the build bot has been honest and built the package from source correctly? Should conda-forge have a PGP key with which the build bot signs the packages so you know it was indeed built by the build bot? How can conda forge verify the integrity of the source file, and how does it pass that trust onto the end user?

I don't know the answers to any of these questions, and the trust model gets much more tricky when you start considering a build bot being run on CI services where even more people could interfere with the process. My current opinion is that conda forge could do gpg verification on the source from PyPI and then sign the binaries, so that you know that conda forge trusted the original source and that you have downloaded a binary built on the conda-forge build bots. Is that enough?

TL;DR: Python package signing is hard, but I think possible from the pip perspective, once you quantify the minimum you want from a trust model. Conda binaries are a whole different problem, and I can't see an easy solution.

This post is going to be the first in a series of posts about the Jupyter Notebook and the supercomputing facility 'iceberg' at Sheffield University.

This post is about a plugin for the Jupyter Notebook I have written to make it easier to work with Jupyter and the conda python package manager. Specifically the fantastic environments feature, which allow you to have multiple different versions of Python and different stacks of packages installed alongside one another.

When working with conda and the Jupyter Notebook you can create a different envronment and install Jupyter into it and then use the notebook from within that environment. This might look something like this:

This approach works fine, but what happens if you want to switch to running your current notebook in the "numpy-1.9" envronment instead to test it with a previous version of NumPy? You would have to do this:

Stop the notebook sever then:

source deactivate
source activate numpy-1.9
jupyter notebook

Then reload the notebook you had open before.

What my Notebook plugin does is enables you to switch environments from within a running notebook server, but using the "kernel" feature of the Notebook.

Each entry in the kernel list above that starts with 'Environment' is a conda environment that has Jupyter installed within it, and you can start a notebook using any of those envronments.

The plugin that enables this is jupyter_envrionment_kernels (catchy name I know). It looks in the directories you specify for installed environments which have Jupyter installed (the ipython executable is in the bin/ directory) and lists them as kernels for Jupyter to find. This makes it easy to run one notebook instance and access kernels with access to different versions of Python or different modules seamlessly.

To solve our earlier problem of "live" switching the kernel we can use the Kernel > Change Kernel menu:

from within the environment in which you want to run the notebook server.

Then run:

jupyter notebook --generate-config

to generate a Jupyter notebook config file (if you already have one then skip this step), finally edit the config file it has generated (by default this is ~/.jupyter/jupyter_notebook_config.py) and add the following two lines:

The first line tells the notebook to use the environment_kernels module to manage the kernels, and the second line lists all the directories in which to look for environments with ipython executables. By default (i.e. if you don't provide the second line) it will look in ~/.conda/envs and ~/.virtualenvs where the top level directory is assumed to be the name of the environment and then it looks inside the bin directory for ipython.

It is also possible to configure the package to use the conda terminal command to find your environments. This will only work if conda is availble from where you ran the notebook command (i.e. you installed the notebook using conda). To use this you just need the:

This implementation creates a list with the first two elements in (some consider the first 0 as part of the sequence) and then loops until the length of the output list is the correct length (+1 because of the 0) adding the next element in as the sum of the previous two.

In [2]:

%timeit fib(10000)

100 loops, best of 3: 7.39 ms per loop

This implementation is respectable, but not exactly fast.

Next up we are going to use a little bit of more modern Python magic to see if we can make a pure Python implementation. This uses an interator which stores it's state when 'yield' is called, upon the next time the iterator is called it will resume from where it left off.

This website and blog is built using the Nikola project. It means that I can write the pages in markdown and compile it into a static site, which I then FTP to my web host. This blog I am writing using Jupyter (IPython) Notebooks, which when combined with a seperate blog meta data file compile into html and appear here for you to read!