Using your Head is Permitted

March 2009 solution

Part 1:

Let us sort the random variables. Call the minimal valued one min, the
next one mid and the largest max. A simple function that results
in a uniform random variable distributed between 0 and 1 is
(mid-min)/(max-min). Proving that this is, indeed,
a solution formula can be done formally by calculating the relevant integrals.
However, here are two ways to gain some intuition on the solution. First,
consider the distribution of mid if it is known that min=m
and that max=M. Essentially, two of the variables are spoken for,
and the one remaining free, mid, is basically a uniformly distributed
variable that has been constrained to be no smaller than m and no larger
than M. This is still a uniformly distributed variable, but between the
new bounds. When using the formula
(mid-min)/(max-min) we rescale the distribution
to be uniform between 0 and 1. As this is always the same distribution,
regardless of the choice of m and M, this is the solution.

A different, perhaps simpler intuition leads to a weaker result: consider
X, Y and Z as (b-a)*x+a,
(b-a)*y+a and
(b-a)*z+a, respectively. In
this description, they are independent, uniformly distributed variables
ranging between 0 and 1 that have been rescaled to be between
a and b. Considering the solution formula, it is clear that
all a's and all b's cancel out, leading to a new random variable
whose distribution is independent of a and b and whose support
is [0,1]. Suppose that the probability of this value to be lower than any
u is defined as p(u), then
p((mid-min)/(max-min)) is clearly a solution
to the riddle. (Proving that p is the identity function is not covered
by this intuition and requires another step.)

Note: The suggested calculation fails if all variables are equal, and therefore
max=min and we have a division by zero, but such an event happens
only with probability zero, so we can pick any value between 0 and 1 as the
result chosen in this case. The variable will remain uniformly distributed
regardless of which value we choose.

Part 2:

The answer is that this is possible if and only if n>1 and k=1.

If n=1 this is clearly not possible.

If n=2, we can simply pick a pair (X,1-X) [where X
is a random variable uniformly distributed between 0 and 1], and this will give
a valid solution for k=1.

If n=3 there are many different solutions for k=1. Here is one
example: Let X be a random variable uniformly distributed between 0 and
1. If X<0.5, let X1=X,
X2=X+0.5 and X3=1-2X.
Otherwise, let X1=X,
X2=X-0.5 and X3=2-2X.

A more fancy solution is to consider the base 3 representation of the three
variables and to choose the variables as uniformly distributed given the
constraint that for all i, the i'th digit of no two is equal.

Another solution is to choose a uniform point on the unit sphere, then take
the three variables to be its projections on the x-coordinate after
rotation by 0, 2π/3 and 4π/3 radians, respectively.

If n is larger than 3, the variables can be divided into sets of pairs
and triplets, and a solution for pairs/triplets can be implemented on each
set separately: the sum of constants is a constant.

All these solutions work for k=1.

A solution for values of k larger than 1 is not possible. Consider the
case k=2. (If all sets of any size k>1 are independent, in
particular this means that all pairs are independent.) We know that

X1+...+Xn=constant

Consider, now, the variance of both sides of this equality. The right side,
being a constant, has variance zero. The left side has variance
equal to the sum of the variances of the individual variables. (The variance
of the sum equals the sum of variances for pairwise independent variables.)
However, as the distribution of each variable is known, their individual
variances are also known. They are 1/12 each. The total variance is therefore
n/12, not zero. A contradiction.

Part 3:

Such an f exists for all values of n. If n is greater
than 1, k must be smaller than n.

Let us scale up all values by a factor of n!. The Xi,
Yi and Zi variables will henceforth be
considered as distributed between 0 and n!, instead of between 0 and 1.

Define W as the sum of floor(Yi) for all
i, taken modulo n!. This is a variable uniformly
distributed among the integers between 0 and n!-1. We will use it to
pick a permutation, w, with equal probability for all n!
permutations.

Let us now define Ti to be the Yi variables
sorted by the size of their fractional parts. One algorithm to produce
Zi according to the requirements of the riddle is to use
Zi=Tw(i).

It is not difficult to see that any set of k<n values of
Zi are independent according to this algorithm.
Consider a vector where n-1 of the Xi values are
known. We wish to prove that the probability of seeing these n-1 values
as the values of (w.l.o.g.) Z1 through
Zn-1 is independent of the actual values of
Xi chosen.

Let us suppose that U is the fractional value of the remaining X
variable. If U is known, then there are exactly n! values that
the remaining X variable can take, each choosing a different w
permutation. This means that in exactly one case (that is, in probability
1/n!) the Zi variables will be as desired.

Because this result is independent of U, any U value is equally
good, meaning that if we don't have any constraint over the remaining X
variable, the probability will remain 1/n!. The result is also
independent of the subset of Xi chosen and of their specific
values, which means that any set of Z1 through
Zn-1 is equally probable, meaning that these
n-1 Z values are independent. The same reasoning works for any
other subset of n-1 Z values, which is what we wanted to prove.

This method of choosing Zi
provides a solution with k=n-1
for any n (and therefore, of course, also for any smaller value of
k). This leaves the case of k=n. This case, however, is
impossible (unless n=1, in which case it is trivial).

To prove this, consider the [0,1]n cube of all possible
values of n independent random variables uniformly distributed between
0 and 1. Because the variables are independent, all parts of this cube must be
covered. Because they are also uniformly distributed, the covering must be of
equal density.

Now, consider some n-tuple,
(Z1,...,Zn) at the output of
f. Because f can output this ordering, it cannot output any other
ordering of the same values. (In f's input, all such reorderings
appear as the same n-tuple of sorted Yi values.)
This means that for any point in the [0,1]n cube covered in the
range of f, there are n!-1 that are not. More generally (and
speaking now of events with positive probability), only 1/n! of the
cube is covered by the range of f.

Imagine a person seeing all Zi and having to decide
whether they are independent or whether they are the output of f. All
this person needs to do is to check whether the n-tuple
(Z1,...,Zn) is in the range of
f. If the Zi are the output of f, this
will occur in probability 1. If they are independent, this will occur in
probability 1/n!. The difference is distinct.