30 November 2007

Pattern avoidance redux: explaining part of Alon's conjecture

Not long ago I wrote a post about pattern avoidance, in which I mentioned the following conjecture of Noga Alon:Conjecture (Alon): The threshold length t(k), for a random permutation to contain all k-permutations with substantial probability, has t(k) ~ k^2/4. The paper by Richard Arratia from which I learned this conjecture doesn't really give any idea where it comes from; some rationale for the conjecture may be in one of Alon's papers (most of his post-1990 papers are available online) but I'm not particularly interested in tracking down the source.Anyway, there are two things to understand about that conjecture -- the "2" and the "4". I have no idea where the 4 comes from, but I think I can heuristically explain the 2. The problem can be looked at as an instance of the classical "coupon collecting" problem. In order to see all k! of the k-permutations, we need there to be about (k! log k!) different places in which we could see a k-permutation. That is, we need

but by Stirling's approximation, the right-hand side is

Now, note that if n is much larger than k, then we have

since we can represent as a quotient of products; the numerator contains k factors which are each approximately n, and the denominator is k!. So we have

Solving for t(k), we get

and applying Stirling's approximation, we get

But for large k, (k \log k)1/k is very close to 1. However, we get the wrong constant: e-2, not 1/4. This is because the dependence problem is quite serious -- if we pick two subsets of length k from a set of size (k/e)2, the probability that they intersect doesn't go to zero as $k$ goes to infinity is nonneglibible, and is something we can compute. Consider the probability that two randomly chosen k-subsets of a set of size ak2 do not intersect. This will be given by the quotient

which is just the number of ways we can choose the second subset not to intersect the first, divide by the total number of ways to choose the second. But recalling our approximation , we see that this is approximately

and as k gets large, this approaches e-1/a. In our case, a = e-2, so this probability is e-e2, or about 0.0006. So the probability that two randomly chosen subsets of size k do intersect is about 0.9994. So the independence assumption is so wrong it's not even funny; the dependence makes it more likely that we see the same patterns over and over again, explaining why we need to go further than (k/e)2 to expect to see all the patterns.