Occam's razor - exercises

Exercise 1. The Number Game

When we reasoned about continuations of sequences (e.g. ) in the Occam’s razor exercise, our hypothesis space was defined over rules: abstract arithmetic functions.

In a related task called the number game, participants were presented with sets of numbers and asked how well different numbers completed them. A rule-based generative model accurately captured responses for some stimuli (e.g. for or , participants assigned high fit to powers of two and multiples of ten, respectively). But it failed to capture others. For instance, what numbers seem like good completions of the set ? How good is 18, relative to 13, relative to 99?

a)

We’ve implemented a rule-only model of this task for you below. Examine the posterior over rules for the following inputs: , , . For the example of just feeding in , why are some rules so strongly preferred over others, even though they are assigned equal probability under the prior? (HINT: think about the likelihood; read the section of the linked number game paper on the size principle if you’re stuck).

c)

Now examine the sets , , and . Sweep across all integers as testQueries to see the ‘hotspots’ of the model predictions. What do you observe?

d)

Look at some of the data in the large-scale replication of the number game here. Can you think of an additional concept people might be using that we did not include in our model?

e) Challenge! [Extra credit problem]

Can you replicate the results from the paper (reproduced in figure below) by adding in the other hypotheses from the paper?

Exercise 2: Causal induction revisited

In a previous exercise we explore the Causal Power (CP) model of causal learning. Griffiths and Tenenbaum [[email protected]], “Structure and strength in causal induction”, hypothesized that when people do causal induction they are not estimating a power parameter (as in CP) but instead they are deciding whether there is a causal relation at all – they called this model Causal Support (CS).

Hint: In the CP model the effect was generated from var E = (datum.C && flip(cp)) || flip(b). You will need to extend this to capture the idea that the cause can only make the effect happen if there is a causal relation at all.

b)

Inference with the MCMC method will not be very efficient for the model you wrote above because the MCMC algorithm is using the single-site Metropolis-Hastings procedure, changing only one random choice at a time. (To see why this is a problem, think about what happens when you try to change the choice about whether there is a causal relation.)

To make this more efficient, construct the marginal probability of the effect directly and use it in an observe statement:

Hint: You can do this either by figuring out the noisy-or marginal probabilities using math, or by asking WebPPL to do so using Infer.

c)

Fig. 1 of [[email protected]] shows a critical difference in the predictions of CP and CS: when the effect happens just as many times with the cause absent as whith the cause present. Show by running simulations the difference between CP and CS in these cases.

d)

Hint: Recall that CS is selecting between two models (one where there is a causal relation and one where there isn’t).

Exercise 3 (Challenge! [Extra credit problem])

Try an informal behavioral experiment with several friends as experimental subjects to see whether the Bayesian approach to curve fitting given on the wiki page corresponds with how people actually find functional patterns in sparse noisy data. Your experiment should consist of showing each of 4-6 people 8-10 data sets (sets of x-y values, illustrated graphically as points on a plane with x and y axes), and asking them to draw a continuous function that interpolates between the data points and extrapolates at least a short distance beyond them (as far as people feel comfortable extrapolating). Explain to people that the data were produced by measuring y as some function of x, with the possibility of noise in the measurements.

The challenge of this exercise comes in choosing the data sets you will show people, interpreting the results and thinking about how to modify or improve a probabilistic program for curve fitting to better explain what people do. Of the 8-10 data sets you use, devise several (“type A”) for which you believe the WebPPL program for polynomial curve fitting will match the functions people draw, at least qualitatively. Come up with several other data sets (“type B”) for which you expect people to draw qualitatively different functions than the WebPPL polynomial fitting program does. Does your experiment bear out your guesses about type A and type B? If yes, why do you think people found different functions to best explain the type B data sets? If not, why did you think they would? There are a number of factors to consider, but two important ones are the noise model you use, and the choice of basis functions: not all functions that people can learn or that describe natural processes in the world can be well described in terms of polynomials; other types of functions may need to be considered.

Can you modify the WebPPL program to fit curves of qualitatively different forms besides polynomials, but of roughly equal complexity in terms of numbers of free parameters? Even if you can’t get inference to work well for these cases, show some samples from the generative model that suggest how the program might capture classes of human-learnable functions other than polynomials.

You should hand in the data sets you used for the informal experiment, discussion of the experimental results, and a modified WebPPL program for fitting qualitatively different forms from polynomials plus samples from running the program forward.