Probabilistic programming from scratch

2017-07-02

Update 2018-09-09: If you're interested in the topic of this post, you might
enjoy the talk I gave at
QCon NY in June 2018. That talk combines the pedagogical code in the
O'Reilly Orioles discussed below, with aspects of the more production/enterprise-oriented
work on probabilistic programming we did at Fast Forward Labs.

O’Reilly just published a series of Orioles (interactive video tutorials) by me
about probabilistic programming in Python. You can read the
highlights
for free. A Safari login is required for the
Orioles
themselves.

The approach: from scratch, by simulation

My goal in the series is that you understand every line of code. I attempt to
ensure this by implementing things from scratch, and choosing a Bayesian
inference algorithm that is particularly transparent and non-mathematical.

The “from scratch” approach means I don’t import any libraries. This was
inspired by Joel Grus’s great book Data Science from
Scratch, which uses
core data structures (lists, dictionaries, etc.) to implement fairly
sophisticated algorithms such as Naive Bayes and feed-forward neural networks.

The series also draws from Jake VanderPlas’s PyCon talk Statistics for
Hackers. In that talk, Jake shows
that, by taking advantage of the fact that computers can repeat things very
quickly, it’s often simplest to simulate your way to the answer to a
statistical question. With this approach, you avoid the need to memorize a zoo
of frequentist statistical tests of dubious applicability. Peter Norvig took a
similar approach in his beautiful notebooks on Probability (A Concrete
Introduction
and Probability, Paradox and the Reasonable Person
Principle).

If you have a non-mathematical programming background, or your calculus is a
little rusty, or you’re the kind of person who likes to compose together simple
well-understood tools to answer complex questions, you might like this
approach.

Approximate Bayesian Computation

In real world problems, with lots of data, and lots of unknown parameters, and
hierarchical structure, inference problems are often solved with industrial
strength probabilistic programming systems such as
PyMC3 or Stan.
These systems embed well-tested and extremely fast implementations of
algorithms such as Hamiltonian Monte Carlo
with the No U-Turn Sampler, or Automatic
Differentiation Variational Inference. These
algorithms are great, but they require a deep mathematical background to truly
understand.

Instead I use Approximate Bayesian Computation. ABC may note be the fastest,
but it’s not wrong, and if you can wait long enough, it always works. And
crucially, given my goal that you understand every line of code, it’s a great
example of the “simulate your way to the answer” approach that relies on the
computer’s ability to repeat calculations.

Here’s the entire Approximate Bayesian Computation algorithm:

Draw a sample from the prior

Simulate the process you observed, assuming the sample from the prior is the
correct value

If the outcome of simulation matches the data you observed, the sample from
prior is now a sample from posterior

Repeat 1-3 until you have enough samples from the posterior to answer a
concrete statistical question.

And if you have a Safari login and want to see this algorithm used to solve
real problems (including online learning), the connection with Bayes’s Theorem
made more explicit, and comparison to PyMC3, check out the
Orioles.