Dynamic Bayesian Inference

The machine depicted illustrates the fundamental calculation of Bayesian inference: The determination of a
posterior distribution from a prior distribution and a likelihood function. The machine shown does the computation
with gravity providing the motive force. The top layer shows the prior distribution as a population of beads
representing, for example, potential neurological sensations, from weak (left) to strong (right). When its
support platform is withdrawn by pulling the upper knob at the right, the beads fall either in front of or
behind the vertical screen just below, representing the likelihood function. Those beads to the front are retained
as shown; those falling behind are rejected and discarded. The final stage, where the second support platform
is removed and the beads move forward on the slanted platform, so as to be displayed with uniform depth, simply
rescales the retained beads, resulting in a distribution proportional to the posterior distribution.

The distance
of the vertical likelihood screen from the front is proportional to the probability density for the observed
value (for example, the recorded sensation) for each possible prior value (e.g. each possible actual sensation).
As shown, both prior and likelihood are normal distributions, with the observed value taken as the central
(a priori most likely) value, but the mechanism is perfectly general. If a weaker value is observed, the vertical
screen should be moved to the left by the appropriate amount; if a stronger value is observed, the screen should
be moved to the right. The vertical likelihood screen operates like a cookie cutter, automatically cutting
the appropriate proportion out of the prior distribution to give (after a simple rescaling) the posterior distribution.

This represents a repurposing of a device created by Francis Galton in 1877 and used in a public lecture on
February 9 that year. He thought of it as displaying the action of natural selection in a model for inheritance
of quantitative characteristics; see S. Stigler, J. Roy. Stat. Soc. (A) 173: 469–482 for a full explanation
with citations. The device may be seen as performing a simple multiplication (of prior times likelihood) or
as an analog for a rejection sampling algorithm. But as repurposed here it shows how one may visualize the
fundamental operation of Bayesian inference, as well as illustrating how a complex mathematical algorithm may
be seen as a simple instance of thresholding.