The Hypothesis Space in Gweon, Tenenbaum, and Schulz (2010)

Ernest Davis, Dept. of Computer Science, New York University
Gary Marcus, Dept. of Psychology, New York University

April 22, 2014; final paragraph added October 2014.

Gweon, Tenenbaum, and Schulz (2010) (henceforth GTS)
carried out the following experiment: 15
month old infants were shown a box containing blue balls and yellow balls. In
one condition of the experiment, 3/4 of the balls were blue; in the other
condition, 3/4 were yellow. In both conditions, the experimenter took out
three blue balls in sequence, and demonstrated that all three balls squeaked
when squeezed (phase 1 of the experiment).
The experimenter then took out an inert yellow ball, and handed
it to the baby (phase 2).
The experimental finding was that, in condition 1, 80% of the
babies squeeze the yellow ball to try to make it squeak, whereas in condition
2, only 33% of the babies squeeze the ball.

The explanation of this finding given by GTS is as
follows. The babies are considering two possible hypotheses about the relation
of color to squeakiness: Hypothesis A is that all balls squeak; hypothesis B
is that all and only blue balls squeak; (the obvious third alternative that
only yellow balls squeak is ruled out by the observation, and therefore can be
ignored). Thus if A is true, then the yellow ball will squeak; if B is true,
it will not. The babies are also considering two possible hypotheses about the
experimenter's selection rule for the first three balls. Hypothesis C is that
the experimenter is picking at random from the set of all balls; hypothesis D
is that she is picking at random from the set of all balls that squeak.

It is assumed that A and B are independent of C and D, and that A, B, C, and D
all have prior probability 1/2.

The model thus posits that the babies are considering a hypothesis space where
there are two dimensions and two alternatives in each dimension.

This seems to us entirely arbitrary. It seems to us that, from the point of
view of the babies' observations, that it would be just as plausible to posit
four dimensions;

A. The relation between color and squeakiness.

B. The selection criterion for phase 1 of the experiment

C. The selection criterion for phase 2 of the experiment.

D. The rule governing whether or not the experimenter squeezes the ball
in phase 2. (The rule governing the experimenter's decision to squeeze
in phase 1 turns out not to matter, since she always squeezes and it always
squeaks.)

Additionally, in each category,
there are additional hypotheses that
are just as plausible as those that GTS are
considering.

Thus, it seems to us that the following Bayesian model would be motivated
(we exclude hypotheses that are inconsistent with the observations).

Dimension A: Relation of squeak to color.
Hypotheses:

1. All balls squeak.

2. All and only blue balls squeak.

3. Balls squeak randomly (by default with probability 1/2).

Dimension B: Selection criterion in phase 1.
Hypotheses:

1. The experimenter chooses toys at random.

2. The experimenter chooses randomly among squeaky toys.

3. The experimenter chooses randomly among blue toys.

4. The experimenter chooses randomly among squeaky blue toys.

Dimension C:: Selection criterion in phase 2.
Hypotheses:

1. The experimenter chooses a toy at random.

2. The experimenter chooses randomly among squeaky toys.

3. The experimenter chooses randomly among non-squeaky toys.

4. The experimenter chooses randomly among yellow toys.

5. The experimenter chooses randomly among squeaky yellow toys.

6. The experimenter chooses randomly among non-squeaky yellow toys.

Dimension D:

1. The experimenter always squeezes the toy in phase 1 and
never squeezes in phase 2.

2. In both phases, the experimenter squeezes the toy only if it squeaks.

There could be additional options in D e.g.

3. If the toy in phase 2 squeaks, the experimenter squeezes it;
otherwise she chooses randomly whether to squeeze it

But that is arguably more complicated and therefore reasonably excluded.

There would thus be 144 combinations; however, not all of these are possible or
distinct. In particular, assuming that the experimenter knows the truth of
dimension A, there are the following logical constraints:

A.1 is inconsistent with C.3, C.6, and D.2

If A.1 is true, B.1 and B.2 are identical; B.3 and B.4 are identical;
C.1 and C.2 are identical; and C.4 and C.5 are identical

A.2 is inconsistent with C.2 and C.5.

If A.2 is true, B.2, B.3, and B.4 are identical; and C.3, C.4, and C.6
are identical.

It is by no means clear
what is the best way to assign priors to these.

The truth is A.2, B.2=B.3=B.4, C.3=C.4=C.6, and D.1=D.2.
In GTS's models, the babies consider
A.1 vs A.2 and B.1 vs B.2; they assume C.4 and D.1.
We do not see any principled reason, from the babies' point
of view, why any of the other alternatives should a priori be considered less
likely, let alone considered impossible.

One can also wonder about the independence of the assumptions between
categories, aside from the logical constraints. In particular, if the
distinction between the two phases of the experiment is not very clear to the
babies --- and it is not obvious that it would be --- then they might assign
a higher probability to the combinations where the same rule is being used
in both phases (i.e. either B.1 and C.1 or B.2 and C.2), and they might assign
a higher probability to D.2, which applies uniformly to both phases, than to
D.1, which creates an arbitrary distinction between the two phases.

A Bayesian model is a non-empty
set of hypotheses; for instance the model
used in GTS is the set { [A1,B1,C1,D1], [A1,B1,C4,D1],
[A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1],
[A2,B2,C1,D2], [A2,B2,C3,D1], }
In principle, therefore, there could be as many as 243-1 different
Bayesian models (about 8.8 trillion) that a theorist might consider; however
most of these are entirely unmotivated and quite implausible. A reasonable
Bayesian model, let us say, is one where one consider one possible set of
choices for A, one for B, and so on.
To compute
a lower bound on the size of this collection
let us consider only hypothesis spaces in which
A.3 and D.1 are included. If A.3 and D.1 are true
then all the hypotheses in B and C are distinct
and consistent. Therefore, any set of hypotheses of the form
``[some subset of the A'a containing A.3] and
[some non-empty subset of the B's] and [some non-empty subset
of the C's] and [some subset of the D's containing D.1]''
is a reasonable Bayesian model; and these are all distinct.
There are therefore more than 4 * 15 * 63 * 2 = 7560 reasonable distinct
Bayesian
models. There are also additional models, either not including
A.3 or not including D.1; but the number of these is probably quite small.