It is well-known that if $\omega=\omega(n)$ is any function such that $\omega \to \infty$ as $n \to \infty$, and if $p \ge (\log{n}+\omega) / n$ then the Erdős–Rényi random graph $G(n,p)$ is asymptotically almost surely connected. The way I know how to prove this is (1) first counting the expected number of components of order $2, 3, \dots, \lfloor n/2 \rfloor$, and seeing that the expected number is tending to zero. Then (2) showing the expected number of isolated vertices is also tending to zero.

This approach also allows more precise results, such as: if $p = (\log{n}+c) / n$ with $c \in \mathbb{R}$ constant, then Pr$[G(n,p)$ is connected] $\to e^{-e^{-c}}$ as $n \to \infty$, which follows once we know that in this regime the number of isolated vertices is approaching a Poisson distribution with mean $e^{-c}$.

I am wondering if it is possible to
give an easier proof (of a coarser result) along the
following lines. There are $n^{n-2}$
spanning trees on the complete graph,
and $G$ is connected if and only if
one of these trees appears. So the
expected number of spanning trees
is $n^{n-2}p^{n-1}$. One might expect that if this
function is growing quickly enough,
then with
high probability $G(n,p)$ is connected.

I think I remember reading somewhere that this approach doesn't quite work --- for example the variance is too large to apply Chebyshev’s inequality. What I am wondering is if there is some way to fix this if we are willing to make $p$ a little bit bigger. In particular, what about $p = C \log{n} / n$ for some large enough constant $C > 1$, or even $p = n^{-1 + \epsilon}$ for fixed but arbitrarily small $\epsilon >0$?

4 Answers
4

A nice question. Here's a strategy that occurs to me, though it could fail miserably.

The basic problem seems to be what you said about variance: the appearances of different spanning trees are far from independent, since it is possible to make local modifications to a spanning tree and get another one. (For example, if x is a leaf joined to y, which is joined only to z, then we can replace the path zyx by the path zxy.)

One way we might try to defeat this is to choose a random set $\Sigma$ of spanning trees, where each spanning tree is chosen independently with probability $\alpha^{n-1}$ for some carefully chosen $\alpha$ (which I imagine as a small negative power of $n$). Then the expected number of trees from $\Sigma$ in a $p$-random graph is $(\alpha p)^{n-1}n^{n-2}$, which is pretty large even when $p$ is pretty close to $n^{-1}$. But now we might expect that any two trees in $\Sigma$ are quite well-separated, so perhaps it is possible to get a decent estimate for the variance.

Actually, it's not clear to me what passing to the random set really achieves here: maybe a simpler method (but not wholly simple) is to work out the expected number of pairs of spanning trees by carefully classifying what they can look like. The hope would be that if you pick one tree at random, then the proportion of trees that overlap with it to any great extent is usually so small that the expected number of pairs is not significantly bigger than the square of the expected number of spanning trees. With $p=n^{-1+\epsilon}$ something like this might work, but you've probably already thought about this.

The expected number of spanning trees becomes large when $p > 1/n$, whereas the expected number of Hamilton paths becomes large when $p > e/n$. Controlling the possible kinds of overlap between two Hamilton paths is much easier than for general trees, so if the method you describe in the last paragraph is going to work it would probably be much easier to implement after restricting attention to Hamilton paths. The best bound one could then hope for would be above the threshold for Hamiltonicity, but this is only a little bigger than the threshold for connectivity anyway.
–
Louigi Addario-BerryMar 31 '11 at 8:26

If so, then with $p=\frac{3*n}{2}$ the expected number of edges is only $\frac{3(n-1)}{4}$ which is nowhere near enough for even one spanning tree but the expected number of spanning trees is still large because of a sort of St. Petersburg paradox.
–
Aaron MeyerowitzMar 31 '11 at 16:24

2

Thanks for this answer, and also for the comment Louigi. Thinking about paths was fruitful, and it looks like one can make the following approach work: Set $p \gg \log^2{n}/n$, and consider the number of paths of length $\approx \log{n}$ between a pair of vertices $x$ and $y$. One can get that the expected number of such paths is large, but since the paths are short they don't intersect too often, and then Janson's inequality (for example) ensures that every pair of vertices is connected by such a path with high probability.
–
Matthew KahleApr 20 '11 at 0:48

Define a cut of a graph $G$ to be a partition of the vertices of $G$ into two sets which are crossed by no edges. So a graph has a nontrivial cut if and only if it is disconnected. We will show that the expected number of cuts of $G$ goes to $0$, so that, with probability $1$, our graph is connected.

For any particular partition of the vertices of $G$ into two sets, of size $k$ and $n-k$, the probability that this partition is a cut is $(1-p)^{k(n-k)} \binom{n}{k}$. So the expected number of cuts is
$$\sum_{k=1}^{n/2} (1-p)^{k(n-k)} \binom{n}{k}.$$
We only have to go up to half way, because we can always take $k$ to be the smaller half of the cut. (I'm going to be sloppy and write non-integer bounds for my summations, as I've done here. You can fix it, if you like.)

For $C>2$, we have the following crude bound
$$\sum_{k=1}^{n/2} (1-p)^{k(n-k)} \binom{n}{k} \leq \sum_{k=1}^{n/2} e^{-p k(n-k)} n^k.$$

So, if $C>2$, we are bounded by $\sum_{k=1}^{n/2} e^{(1-C/2) k \log n}$. This is a geometric series, whose sum is easily seen to be bounded by a constant multiple of its leading term; namely $n^{(1-C/2)}$. So the sum goes to $0$ and we are done.

Now, what if $C>1$, but not as large as $2$? Let $a$ be a real number such that $1-C(1-a) < 0$. The preceeding argument shows that the contribution of the terms with $k<an$ is negligible. (If $C\leq 1$, there is no such number and this proof breaks.)

We now consider the remaining terms, and use a different crude bound:
$$\sum_{k=an}^{n/2} (1-p)^{k(n-k)} \binom{n}{k} \leq \sum_{k=an}^{n/2} e^{-pk(n-k)} 2^n$$

David, thanks for the reply. However the proof you gave is exactly the proof I describe above --- counting nontrivial cuts is equivalent to counting small connected components. (And this proof can be refined to give the more precise result I described.) What I am looking for a proof that instead uses the fact that directly makes use of the fact that the expected number of spanning trees is tending to infinity very quickly.
–
Matthew KahleMar 31 '11 at 17:37

Can you estimate how fast the number of spanning trees tends to infinity if $p=\frac{\log n}{n}$. I'd expect "really fast" although the chance of being connected is only about $1/3$. You would have to beat that rate.
–
Aaron MeyerowitzApr 1 '11 at 20:46

Dear Mathew, this is not really an answer to your question but just a related matter. As you point out the expected number of trees in a random graph is 1 already when p=c/n and is very large when p=logn/n so the hope is that this can be used to show that with a large probability the random graph contains a tree. There is a collection of conjectured by Jeff Kahn and me trying to suggest a very general connection of this type. These conjectures are presented in the pape Thresholds and expectation thresholds by Kahn and me and are mentioned in this MO question. If true these conjectures will imply that the threshold for connectivity will be below logn/n (of course, we do not need it for this case...), and the proof will probably will at best be much much more complicated then existing proofs.

I should mention that the sharp threshold property which was proved by Erdos and Renyi for connectivity can be proved (with harder proofs) from more general principles: One is the Margulis-Talagrand theorem which applies to the threshold for random subgraphs of highly edge connected graphs and one is Friedgut's result which identify graph properties with coarse thresholds.

I think perhaps the problem with the variance is not the overlap of the different trees but the fact that the number of spanning trees can be much much larger than 1 but not much less.

With $p=\frac{c}{n}$ the expected number of spanning trees may indeed grow exponentially (although perhaps not as fast as $\frac{c^{n-1}}{2n}$ ) while the probability of being connected goes exponentially to $0$.

Maybe with $p=\frac{\log n-\log \log n}{n}$ the chance of being connected decreases like $e^{-n}$ but the number of spanning trees of the single large component grows something like $(\log n)^{n}$. Then the rare occasions that the giant component is the whole graph (i.e. the graph is connected) would still make the expected number of spanning trees grow something like $(\frac{\log n}{e})^n$ which is superexponential.