Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model

I have now completed a draft of a write-up of a proof of the following statement. Recall that a random -sided die (in the balanced-sequences model) is a sequence of length of integers between 1 and that add up to , chosen uniformly from all such sequences. A die beats a die if the number of pairs such that exceeds the number of pairs such that . If the two numbers are the same, we say that ties with.

Theorem.Let and be random -sided dice. Then the probability that beats given that beats and beats is .

In this post I want to give a fairly detailed sketch of the proof, which will I hope make it clearer what is going on in the write-up.

The first step is to show that the theorem is equivalent to the following statement.

Theorem.Let be a random -sided die. Then with probability , the proportion of -sided dice that beats is .

We had two proofs of this statement in earlier posts and comments on this blog. In the write-up I have used a very nice short proof supplied by Luke Pebody. There is no need to repeat it here, since there isn’t much to say that will make it any easier to understand than it already is. I will, however, mention once again an example that illustrates quite well what this statement does and doesn’t say. The example is of a tournament (that is, complete graph where every edge is given a direction) where every vertex beats half the other vertices (meaning that half the edges at the vertex go in and half go out) but the tournament does not look at all random. One just takes an odd integer and puts arrows out from to mod for every , and arrows into for every . It is not hard to check that the probability that there is an arrow from to given that there are arrows from to and to is approximately 1/2, and this turns out to be a general phenomenon.

So how do we prove that almost all -sided dice beat approximately half the other -sided dice?

The first step is to recast the problem as one about sums of independent random variables. Let stand for as usual. Given a sequence we define a function by setting to be the number of such that plus half the number of such that . We also define to be . It is not hard to verify that beats if , ties with if , and loses to if .

So our question now becomes the following. Suppose we choose a random sequence with the property that . What is the probability that ? (Of course, the answer depends on , and most of the work of the proof comes in showing that a “typical” has properties that ensure that the probability is about 1/2.)

It is convenient to rephrase the problem slightly, replacing by . We can then ask it as follows. Suppose we choose a sequence of elements of the set , where the terms of the sequence are independent and uniformly distributed. For each let . What is the probability that given that ?

This is a question about the distribution of , where the are i.i.d. random variables taking values in (at least if is odd — a small modification is needed if is even). Everything we know about probability would lead us to expect that this distribution is approximately Gaussian, and since it has mean , it ought to be the case that if we sum up the probabilities that over positive , we should get roughly the same as if we sum them up over negative . Also, it is highly plausible that the probability of getting will be a lot smaller than either of these two sums.

So there we have a heuristic argument for why the second theorem, and hence the first, ought to be true.

There are several theorems in the literature that initially seemed as though they should be helpful. And indeed they were helpful, but we were unable to apply them directly, and had instead to develop our own modifications of their proofs.

The obvious theorem to mention is the central limit theorem. But this is not strong enough for two reasons. The first is that it tells you about the probability that a sum of random variables will lie in some rectangular region of of size comparable to the standard deviation. It will not tell you the probability of belonging to some subset of the y-axis (even for discrete random variables). Another problem is that the central limit on its own does not give information about the rate of convergence to a Gaussian, whereas here we require one.

The first problem is dealt with for many applications by local central limit theorems, about which Terence Tao has blogged in the past. These tell you not just about the probability of landing in a region, but about the probability of actually equalling some given value, with estimates that are precise enough to give, in many situations, the kind of information that we seek here.

What we did not find, however, was precisely the theorem we were looking for: a statement that would be local and 2-dimensional and would give information about the rate of convergence that was sufficiently strong that we would be able to obtain good enough convergence after only steps. (I use the word “step” here because we can think of a sum of independent copies of a 2D random variable as an -step random walk.) It was not even clear in advance what such a theorem should say, since we did not know what properties we would be able to prove about the random variables when was “typical”. That is, we knew that not every worked, so the structure of the proof (probably) had to be as follows.

1. Prove that has certain properties with probability .

2. Using these properties, deduce that the sum converges very well after steps to a Gaussian.

3. Conclude that the heuristic argument is indeed correct.

The key properties that needed to have were the following two. First, there needed to be a bound on the higher moments of . This we achieved in a slightly wasteful way — but the cost was a log factor that we could afford — by arguing that with high probability no value of has magnitude greater than . To prove this the steps were as follows.

Let be a random element of . Then the probability that there exists with is at most (for some such as 10).

The probability that is at least for some absolute constant .

It follows that if is a random -sided die, then with probability we have for every .

The proofs of the first two statements are standard probabilistic estimates about sums of independent random variables.

The second property that needed to have is more difficult to obtain. There is a standard Fourier-analytic approach to proving central limit theorems, and in order to get good convergence it turns out that what one wants is for a certain Fourier transform to be sufficiently well bounded away from 1. More precisely, we define the characteristic function of the random variable to be

where is shorthand for , , and and range over .

I’ll come later to why it is good for not to be too close to 1. But for now I want to concentrate on how one proves a statement like this, since that is perhaps the least standard part of the argument.

To get an idea, let us first think what it would take for to be very close to 1. This condition basically tells us that is highly concentrated mod 1: indeed, if is highly concentrated, then takes approximately the same value almost all the time, so the average is roughly equal to that value, which has modulus 1; conversely, if is not highly concentrated mod 1, then there is plenty of cancellation between the different values of and the result is that the average has modulus appreciably smaller than 1.

So the task is to prove that the values of are reasonably well spread about mod 1. Note that this is saying that the values of are reasonably spread about.

The way we prove this is roughly as follows. Let , let be of order of magnitude , and consider the values of at the four points and . Then a typical order of magnitude of is around , and one can prove without too much trouble (here the Berry-Esseen theorem was helpful to keep the proof short) that the probability that

is at least , for some positive absolute constant . It follows by Markov’s inequality that with positive probability one has the above inequality for many values of .

That’s not quite good enough, since we want a probability that’s very close to 1. This we obtain by chopping up into intervals of length and applying the above argument in each interval. (While writing this I’m coming to think that I could just as easily have gone for progressions of length 3, not that it matters much.) Then in each interval there is a reasonable probability of getting the above inequality to hold many times, from which one can prove that with very high probability it holds many times.

But since is of order , is of order 1, which gives that the values are far from constant whenever the above inequality holds. So by averaging we end up with a good upper bound for .

The alert reader will have noticed that if , then the above argument doesn’t work, because we can’t choose to be bigger than . In that case, however, we just do the best we can: we choose to be of order , the logarithmic factor being there because we need to operate in many different intervals in order to get the probability to be high. We will get many quadruples where

and this translates into a lower bound for of order , basically because has order for small . This is a good bound for us as long as we can use it to prove that is bounded above by a large negative power of . For that we need to be at least (since is about ), so we are in good shape provided that .

The alert reader will also have noticed that the probabilities for different intervals are not independent: for example, if some is equal to , then beyond that depends linearly on . However, except when is very large, this is extremely unlikely, and it is basically the only thing that can go wrong. To make this rigorous we formulated a concentration inequality that states, roughly speaking, that if you have a bunch of events, and almost always (that is, always, unless some very unlikely event occurs) the probability that the th event holds given that all the previous events hold is at least , then the probability that fewer than of the events hold is exponentially small in . The proof of the concentration inequality is a standard exponential-moment argument, with a small extra step to show that the low-probability events don’t mess things up too much.

Incidentally, the idea of splitting up the interval in this way came from an answer by Serguei Popov to a Mathoverflow question I asked, when I got slightly stuck trying to prove a lower bound for the second moment of . I eventually didn’t use that bound, but the interval-splitting idea helped for the bound for the Fourier coefficient as well.

So in this way we prove that is very small if . A simpler argument of a similar flavour shows that is also very small if is smaller than this and .

Now let us return to the question of why we might like to be small. It follows from the inversion and convolution formulae in Fourier analysis. The convolution formula tells us that the characteristic function of the sum of the (which are independent and each have characteristic function ) is . And then the inversion formula tells us that

What we have proved can be used to show that the contribution to the integral on the right-hand side from those pairs that lie outside a small rectangle (of width in the direction and in the direction, up to log factors) is negligible.

All the above is true provided the random -sided die satisfies two properties (the bound on and the bound on ), which it does with probability .

We now take a die with these properties and turn our attention to what happens inside this box. First, it is a standard fact about characteristic functions that their derivatives tell us about moments. Indeed,

,

and when this is . It therefore follows from the two-dimensional version of Taylor’s theorem that

plus a remainder term that can be bounded above by a constant times .

Writing for we have that is a positive semidefinite quadratic form in and . (In fact, it turns out to be positive definite.) Provided is small enough, replacing it by zero does not have much effect on , and provided is small enough, is well approximated by .

It turns out, crucially, that the approximations just described are valid in a box that is much bigger than the box inside which has a chance of not being small. That implies that the Gaussian decays quickly (and is why we know that is positive definite).

There is a bit of back-of-envelope calculation needed to check this, but the upshot is that the probability that is very well approximated, at least when and aren’t too big, by a formula of the form

.

But this is the formula for the Fourier transform of a Gaussian (at least if we let and range over , which makes very little difference to the integral because the Gaussian decays so quickly), so it is the restriction to of a Gaussian, just as we wanted.

When we sum over infinitely many values of and , uniform estimates are not good enough, but we can deal with that very directly by using simple measure concentration estimates to prove that the probability that is very small outside a not too large box.

That completes the sketch of the main ideas that go into showing that the heuristic argument is indeed correct.

Any comments about the current draft would be very welcome, and if anyone feels like working on it directly rather than through me, that is certainly a possibility — just let me know. I will try to post soon on the following questions, since it would be very nice to be able to add answers to them.

1. Is the more general quasirandomness conjecture false, as the experimental evidence suggests? (It is equivalent to the statement that if and are two random -sided dice, then with probability , the four possibilities for whether another die beats and whether it beats each have probability .)

2. What happens in the multiset model? Can the above method of proof be adapted to this case?

3. The experimental evidence suggests that transitivity almost always occurs if we pick purely random sequences from . Can we prove this rigorously? (I think I basically have a proof of this, by showing that whether or not beats almost always depends on whether has a bigger sum than . I’ll try to find time reasonably soon to add this to the draft.)

Of course, other suggestions for follow-up questions will be very welcome, as will ideas about the first two questions above.

13 Responses to “Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model”

It occurs to me that we could follow up (in addition to #3) by looking at a different small change that breaks the intransitivity. In particular, the “improper dice” model in the paper that allowed values > n, but still constrains the average value. They suggested that this improper dice model leads to a more transitive “beats” relation.

To better fit the improper dice model with this proof technique, it can be viewed as looking at [n]^n sequences, and conditioning on the sum being m(m+1)/2 for some m<n. Since this is just conditioning on a different V value, maybe this is just a small step from the current results.

Having one model where average value constraint isn't sufficient, along with one showing removing the average value constraint breaks things (if the proof for #3 works out), it feels like the constraint is necessary but must be "balanced". Another nugget pointing this way is that the multiset model also appears to work, and the constraints appear "balanced" here as well.

Notice in the unconstrained [n]^n sequence, the "complementary die" involution often has a different average value. And notice for the "improper dice" model this involution does not exist (the complementary die may not even be in the set). I wonder if these two conditions together are necessary.

Are there any models that both have a constrained average, and the involution exists, yet the dice are transitive? I think someone mentioned there is a "continuous" model of the dice which is transitive, which would be an example. Has this been shown? And are there any finite models of dice that also displays this?

Here is an idea to show the tournament is not quasirandom. For two dice and , let . If and are independent typical dices, this has order of magnitude . Assume that and are very close to each other but not too close from the “trivial” dice in the sense that but with small compared to . If is uniform in , then is approximately Gaussian with , so we should have . Hence, if is conditioned to be balanced, with high probability, while , which is much larger. Therefore, with high probability, and have the same sign, so the probability that beats both and is not .

The harder part would be to show that if and are uniform balanced dice, they are close from each other (and not too close from ) with macroscopic probability. It seems more or less equivalent to proving that the sequence of processes is tight and does not converge to (it should actually converge to some Brownian motion-related process, the most natural candidate is a Brownian bridge conditioned to have integral on ).

Sorry to be so slow to respond, but this seems like a very promising approach. Also, I feel reasonably optimistic that what you describe as the harder part may in fact be not too hard. Suppose, for instance, that we split into equal-sized intervals for some suitably chosen absolute constant . Then setting I think it is the case that, with macroscopic probability, in each of the first intervals there are approximately points from and in each of the last intervals there are approximately points. (Here “approximately” means something like “to within .)

Maybe it would work out simpler to follow the approach in this post. For this comment, I’ll assume enough familiarity with that post to know what I mean by the random variables and .

The condition for the strong conjecture set out in that post is that with probability we should have

.

But it seems to me that that could easily be false: with macroscopic probability we should have small, and therefore with macroscopic probability both and are small. But with macroscopic probability that should not imply that is small.

Actually, it looks as though proving these assertions will be of roughly equivalent difficulty to proving the ones you wanted: perhaps we just want a general theorem that says that certain broad shapes occur with macroscopic probability.

This is very nice and I really like the sequence model. Now that we know that balancedness implies (asymptotic) random outcomes for three dice but not for 4 dice we can wonder what additional condition may imply randomness for 4 dice (which if I understood correctly implies pseudorandomness for any number of dice). A natural guess would be that the dice are 1) balanced 2) have the same variance. In other words if the sum of entries of the dice is precisely the expectation () and the sum of squares of entries is precisely the expectation ().

We can also compare (balanced and non balanced) dice based on other Boolean functions.

Assuming by “balancedness” you mean constraining the expectation value of a dice roll to be the same for all dice, then balancedness does not imply random outcomes for three dice. For example if we use the sequence model, but constrain on an average other than n(n+1)/2, transitiveness can come back. Or at least that is what is suggested by the “improper” dice in the original paper.

So to get random outcomes for three dice, maybe there is some combination of constrained expectation + “condition X” that would be sufficient. I’m not sure what condition X is, but it still feels to me like the “complementary die” involution plays an important role. I’ve toyed with some ideas but haven’t managed to come up with anything convincing. Such a result would also apply directly to the multiset case (which also meets those conditions).

Hmm… maybe this should be obvious, but I don’t understand this intuitively: For some reason a constraint on the sum of squares is compatible with the existence of the “complementary die” involution.

If we denote the values of the die faces as v_i, then the constraint on the sum is:
sum v_i = n(n+1)/2
And the “complementary die” will also have the same sum of squares:
sum (n+1-v_i)^2 = sum [ (n+1)^2 – 2(n+1)v_i + (v_i)^2 ] = n(n+1)^2 – 2(n+1)n(n+1)/2 + sum (v_i)^2 = sum (v_i)^2

I’m not sure of a good way to generate random dice with the usual average constraint along with the sum of squares constrained to that of the standard die, but for small n it isn’t too hard to just calculate all of them. With n=14, there are 3946 proper dice with the additional sum of squares constraint. Picking a random die, it beats roughly as many dice as it loses to. But if we pick two at random and check out how it separates the remaining dice into four quadrants, it does not appear to split the space evenly. There still appears to be a lot of correlation.

I think there may not be any nice condition that guarantees quasirandomness, though I would be happy to be wrong about this.

The reason is connected with the ideas in the comment thread started by Thomas Budzinski just above. If those ideas are roughly correct, then what they tell us is that quasirandomness fails to occur because there is a non-negligible probability that two dice are “close” in a suitable sense. In particular, I currently believe that a proof of non-quasirandomness along the following lines should work, though clearly there will be details to be sorted out.

1. Choose some fixed positive integer such as 10.

2. Prove that with probability bounded away from zero, the values of are approximately when and when .

3. Prove that if and both satisfy the above condition, then there is significant correlation between the events “ beats ” and “ beats “.

Assuming a proof like that can be made to work, I find it very hard to see what extra condition could hope to stop it working. The proof would be telling us that the set of dice is in some sense “low-dimensional”, and that seems hard to cure by passing to a naturally defined subset.

We looked at that book, and it is not impossible that it contains a result we can use. However, it wasn’t clear that there was a result we could quote directly. The problem is that (if my understanding is correct) there is a constant involved that depends on the distribution. And the dependence is necessary, since one could have a distribution that is almost entirely, but not quite, concentrated in a sublattice. What we need is an estimate after steps of a random walk, where the random walk itself depends on . So in the end we had to prove explicit bounds on the characteristic function outside a certain small region and go through what I imagine is essentially the standard characteristic-functions approach to LCLT making sure that the final bound obtained is good enough.

However, I don’t rule out that there is in the literature, and perhaps even in the Lawler-Limic book, a result we could quote that would save us from having to prove this small modification of existing results. It would be nice to be able to do that.

[…] This post is to note that the polymath13 project has successfully settled one of the major objective. Reports on it can be found on Gower’s blog especially in this post Intransitive dice IV: first problem more or less solved? and this post Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model. […]