Right at least once with high probability

Menu

Category Archives: Random graphs

It’s been a while since I’ve posted so I figured I’d post something short to get back in the swing of things. I spent a week chasing some results in spectral graph theory without knowing anything about it. I still have a hope to find something there but first I need to fill in some gaps in my knowledge. Anyway, recently I came across this problem set. I really like all the problems but the question about caught my attention. I figured I would look into it.

Apparently this is also a standard model, but I hadn’t heard of it. A graph is generated by taking vertices and randomly placing edges in the graph. The most notable difference is that vertex pairs are selected with replacement, so there is a possibility of parallel edges. The problem from that assignment allows for self-loops, which is just insanity. I also don’t want to post solutions to other people’s problems. But mostly it is insanity, so we’ll stick to the more standard model.

Actually, some natural questions about this model are well-known results. We can ask for what value of do we get a double edge with high probability; this is just the birthday paradox. We can ask for what value of do we have a subgraph that is complete; this is coupon collector. Another question we can ask is if , what is a bound on the maximum number of parallel edges?

That last one probably isn’t as common but it was actually a homework problem for me once [1]. It was phrased as follows: If we toss balls into bins by selecting the bin for each ball uniformly at random, show that no bin contains more than balls with high probability, more specifically .

This an exercise in Chernoff bounds, which are essential in basically every randomized algorithm I’ve ever seen. It places bounds on the probability that a random variable with a binomial distribution is really far from its mean. Actually it can be the sum of negatively associated random variables, but that’s not really important here. The really nice thing is that the bound is exponentially small. There are a bunch of different forms but they’re all derived in a similar manner. The version we’re going to use is best when we want to bound the probability that a random variable is really far from its mean. Moving past this extremely technical terminology, the bound states that .

The overall plan, as per usual, is to bound the probability that any bin has more than and then complete the problem by union bounding. For this to work we want to aim to bound the probability a given bin by . We let be the number of balls in a given bin, is if the th ball is thrown in the bin, otherwise. Clearly and the are independent, so we can apply Chernoff. I’m also going to be really lazy with constants, it’s definitely possible to be more precise. We see that , so if and then

That last step wasn’t obvious at all to me, so here’s a short justification. Let and

At this point we’re done, since the union bound implies that the probability any bin has more than balls is less than and so the probability that no bin has exceeded is .

That didn’t actually have anything to do with the random graph model specifically, I suppose, but the balls in bins idea has obvious applications elsewhere, like hashing and job assignments. Returning to , we’re just going to look at the number of isolated vertices in a very brief and incomplete fashion.

A vertex is isolated with probability since it’s basically repeated attempts to hit one of the edges out of possible. Thus the expected number of isolated vertices is . If we let and take the limit we see that

and so if there is almost surely no isolated vertex. If this looks familiar it’s because it’s basically the same thing as . This is kind of obvious anyway. At least from the perspective of isolated vertices the fact that you can have multiple edges makes very little difference. The only change is the probability of getting an edge adjacent to a vertex is now fixed and we’re altering the number of attempts. The variance computation also appears to be identical, so I won’t put it here.

When I was trying to make my own modifications to for weighted graphs I was trying all kinds of strange things. I tried generating an edge with probability and then selecting a weight from with different distributions. Strange things happened. This is also a pretty nice and easy to understand model. I’ll come back to it once I have learned something about spectral graph theory.

Sources:
[1] I looked around for the website of the class I took the balls and bins problem from, but I can’t seem to find it. The class was Probability and Computing, 15-359, taught by professors Harchol-Balter and Sutner in the spring of 2012.

One of the popular models for random graphs is Erdos-Renyi. We generate by taking nodes and between every pair generating an edge with probability . This is nice because it makes computing properties of the graph really easy. The downside is that very few structures actually follow this model – for instance in social networks if and are adjacent and and are adjacent, then and are more likely to be adjacent. This is clearly not the case in . It’s a shame, but at least it’s fun to play with.

A monotone property of a graph is one that is always preserved by adding more edges. For a monotone property then it is a natural question to ask for what values of does have with probability one as goes to infinity? Since the property is monotone there will be some function of at which point almost surely has property . There are in fact sharp thresholds for many properties (as always for a more complete discussion of this , check out [1]). We’ll focus on connectivity, clearly a monotone property.

We’re going to take a shot at bounding the threshold value from below. Our first approach is an extremely loose bound. Let be the random variable representing the number of edges in (we’ll just call it from now on). If goes to zero then with high probability has no edges at all. This is evident from Markov’s (Hopcroft and Kannan call it the First Moment Method). Since , if drops to zero then the probability there are any edges at all drops to . Since the expected number of edges in the graph is , we see that if where then . So if when , almost surely has no edges and is therefore disconnected with high probability.

Actually we can do a little better than this even with our super-naive approach. We don’t need to drop to zero. A graph is disconnected if it has fewer than edges, so if goes to zero, is almost surely disconnected. Therefore all we need is that . This revision suggests that will do the trick since , and so . For this drops to so is almost surely disconnected.

It should come as no surprise to find that this is still a terrible, terrible bound. There’s a long way between having edges and being connected, since a graph with randomly placed edges is very unlikely to be connected. We can achieve a marginally better bound without doing too much work though. A graph is connected if and only if it has a spanning tree. If a graph almost surely does not have a spanning tree then it is disconnected with high probability. Let be the number of spanning trees in . If drops to then almost surely does not have a spanning tree and so is almost surely disconnected. We know . If then , which goes to . So our new lower bound for is , better by literally an infinitesimal amount.

The actual optimal lower bound for is where . In fact, connectivity experiences what is called a sharp threshold. That is, there exists a such that almost surely is disconnected for where and is almost surely connected for . The proof is a little messy for my tastes, so we’ll just finish the process of bounding from below. The following proof is just taken from Hopcroft and Kannan.

We will now investigate the expected number of isolated vertices (vertices with degree ). Clearly if almost surely has an isolated vertex then is almost surely disconnected. Let be the number of isolated vertices, so . If , then , so . So if then almost surely has no isolated vertices.

Unfortunately this is not actually enough to tell us that for there are almost surely isolated vertices. It could be that most isolated vertices are all in some small set of graphs and the rest all have no isolated vertices. To fix this we need what Hopcroft and Kannan call the second moment method, which is basically using Chebyshev’s to show that if goes to then is almost surely not zero since . An immediate consequence of this by the definition of variance is that if then is almost surely greater than zero.

This is generally a little messier since variance computations are not quite as nice as computing the mean, but in this case it isn’t too bad. We want . Splitting it up into the typical indicator random variables gives us . Since , that part is . There are terms in the second sum and each term is where we avoided double-counting the edge between vertices and . Now we can just compute , substitute in where and send it to infinity.

So almost surely has an isolated vertex is .

As mentioned before, if and , it is possible to show that is almost surely connected (instead of that it almost surely has no isolated vertices). This is shown by Hopcroft and Kannan by considering the “giant component.” In as increases a single component contains most () of the vertices, and the rest are isolated vertices. Once is where the isolated vertices disappear because the giant component has swallowed the whole graph.

Erdos-Renyi is an interesting model to play around with, mostly because it’s the only model I can easily follow most of the math for. For a while I was trying to apply some basic spectral graph theory techniques to it, but I couldn’t make it hold up except in the most basic of facts. For instance the expected Laplacian of is obvious, and if you apply Kirchhoff’s Theorem to it then you get the expected number of spanning trees. What I couldn’t do was figure out the expected second smallest eigenvalue, which is zero if and only if is connected. Perhaps a little more creativity was needed on that front. Hopefully I get a chance to review it.