教学方

Tim Roughgarden

脚本

So at this point we understand, Prim's algorithm and we also know why it's correct. That is why it always computes the minimum cost spanning tree of a graph. So in this video, we're going to turn to implementation issues and running time analysis. We'll begin by just analyzing the straightforward implementation of Prim's algorithm. That's already reasonable. It's polynomial running time, but not especially close to linear. Then we'll see how a suitable deployment of heaps very much in the way that we did for Dijkstra's algorithm leads to a blazingly fast, near linear running time. So let's briefly review the pseudocode for Prim's algorithm. Recall that Prim grows a tree one edge at a time spanning one new vertex at each iteration. So it maintains two sets, capital X, a set of vertices that have spanned so far, and capital T, these are the edges we've committed to thus far. They start out by just being some arbitrary vertex, little s, and the empty set, and in each iteration of this main while-loop, we add one new edge to the tree. And whatever new vertex that edge spans, we add that to capital X. The algorithm terminates when we're spanning all of the vertices and as we've seen, it halts not just with a spanning tree but with a minimum cost spanning tree. So suppose we just literally implemented this algorithm as is, what would be the running time? Well, the initialization stick, step takes only constant time, so let's ignore that. So let me just have this one loop. So let's just ask how many iterations does the loop take and how much time is needed to execute each iteration? Well, the number of loop iterations is going to be exactly n - 1, so, where n is the number of vertices. X starts out with one vertex and terminates when it has all n vertices. How much work do we need to implement each iteration? Well, essentially, what each iteration is doing is a brute-force search through the edges. It's looking for the edges that have one endpoint inside X and one endpoint outside, and amongst those, it just remembers the cheapest. And it's easy to see that you could implement each iteration in O of m time, where M is the number of edges. For example, you can just, with each vertex associate a Boolean variable that keeps track of whether or not it's in this capital X, that way when you see an edge, you know whether it's crossing the frontier or not in constant time. So putting it together, O of m iterations with O of m works for each gives us a running time of O of m times n. So this running time is already nothing to sneeze at. As we discussed, graphs can have an exponential number of spanning trees. So, this algorithm is doing far less work than examining each of these spanning trees. It's homing in in polynomial time, to the minimum cost point. So that's pretty cool. But remember the mantra of any algorithm designer worth their salts, confronted with a solution, you should always ask but can we do better? And can we do better than running time O of m times n? We can as we'll see in the rest of this video. The big idea for speeding up Prim's Algorithm is exactly the big idea we used in part 1 to speed up Dijkstra's algorithm, namely we're going to deploy a suitable data structure. So, what data structure seems like it might be a good idea for making Prim run faster? Well, what's happening in the main workhorse while-loop of Prim's algorithm? Over and over again, we keep meaning to do a minimum computation amongst all edges crossing the frontier, we need to find the cheapest one. So, the question we should ask ourselves is what kind of data structure would facilitate, would speed-up repeated minimum computations. And if you recall from part 1, we have a data structure where that's exactly what it's raison d'etre is, the heap, the meaning of life for a heap is to speed-up repeated minimum computations, just like in Prim's algorithm. So let me just remind you briefly, what are the operations exported by heap data structure and what is the running time? So first recall that a heap contains a bunch of objects, and each of those objects should have some key value from some totally ordered set, like a number, like for example, an edge cost. So what can you do with a heap? Well, the salient operations for our purposes today are, first of all, you can insert new stuff into the heap with their, whatever their key value is. You can extract the object with the minimum key value. And you can also delete stuff from the middle of the heap. And all of these can be done in logarithmic time, logarithmic in a number of objects stored by the heap. So it's not going to be important for us today to know how heaps are implemented and what they look like under the hood. We're just going to be clients of them. We're just going to make use of these operations and the fact that they run in logarithmic time. But you know, just for those of you who are curious, and/or want to have your memory jogged. Under the hood, heaps are implemented logically as complete binary tree. They're actually laid out in an array, but you sort of think of them conceptually as being in a complete binary tree. And they, they, they satisfy what's called the heap property. And the heap property is to make sure that you know where the object with the minimum key value is. So the actual definition is, every parent should have a key value which is less than that of its children. So as you go up the tree, the key value can only drop and that means you know where the minimum is got to be. It's got to be at the root of this tree orr the front of the array. So that's great. That's how you locate the minimum so quickly in a heap. Now, what do you do when you want to extract the minimum? So you rip off the root of this tree, and now, you have to rearrange the tree to restore the heap property. So you swap the last leaf up to where the root was, you bubble-down as needed to restore the heap property. how do you insert? You put the new object as the new last leaf and you bubble it up as needed to restore the heap property. To delete from the middle of a heap, you just sort of rip it out and then bubble things up or down as necessary to restore the heap property. Again, that's not supposed to, if you're hearing this for the first time, I know this is too fast, this is just to jog your memory for those of you who already learned this in part 1 of the course. For more details, you can go review the appropriate videos there. So now that I've reminded you about the salient properties of heaps. Let's return to the question of how do we deploy them cleverly to speed-up Prim's algorithm. So our intuition is that because we're doing repeated minimum computations in Prim's algorithm, each time that it's while-looped, compute the cheapest edge cross in your frontier, that's sort of in the wheelhouse of heaps. So how should we use heaps? Well, the first idea, which is a pretty good idea, is to use the heap to store edges, right? Because our minimum computation should result in us choosing an edge, so when we EXTRACT-MIN from a heap, we want it to hand us an edge on a silver platter. So it would seem this would be your first thought, that the heap should store edges and that the key value that you use should just be the edge cost, because you want to find the cheapest edge. So this already a quite good idea using heaps in this manner. We'll already definitely speed-up Prim's algorithm relative to the naive implementation. And in fact. and I'll leave this as an exercise for you to work out. using heaps in this way results in an implementation that has, that runs in time big O of m log n. What I'm going to show you instead is not that implementation, but an even cleverer implementation of Prim using heaps. We're not going to see a benefit in the asymptotic running time. This more sophisticated version will also give us m log n running time, but it would give you better constants and it is the version you would want to implement in practice. [SOUND] So, the one slightly tricky point in this exercise is remembering Prim's algorithm, you don't just want the cheapest edge overall [INAUDIBLE] You want the cheapest edge which crosses the current cut that has one endpoint in each of x and v - x. And, when you use heaps in this way, it might hand you in a silver platter and edge which is cheap, but isn't necessarily crossing the frontier. So, you need some extra checks to ensure that you're always finding the minimum edge and that that edge crosses the frontier between x and v - x. So I'll leave it to you to work out the details of this implementation in the privacy of your own home. What I want to spend our time together on instead is this somewhat more sophisticated, more practical way to use heaps. And for those of you who remember our fast implementation of Dijkstra, this will be very familiar to you. It will be the same kinds of ideas that we used for Dijkstra, and the keypoint is, instead of using the heap to store edges, were going to use it to store vertices. So, in a bit more detail, our plan is going to be to maintain two invariants. The first invariant is going to describe what the heap contains. The second invariant is going to be what the key values of those heap object are. So as I said, we're now going to be starting at vertices in the heap, not edges. Which vertices? Exactly the vertices that we don't yet span. The vertices of v - x. The motivation here is that rather than getting on a silver platter, the edge in which to add next to the tree, we're going to get from a heap on a silver platter, the vertex, that we're next going to add to capital X. So the second invariant tells us what the key values of these vertices in v - x are supposed to be. And we're going to define them to be the cheapest cost of an edge incident of this vertex that crosses the current frontier. So, I think a picture will make this definition clear. So, consider some snapshot of Prim's algorithm at some iteration. We have our vertices X that were already spanning. We have our vertices v - x that were not spanning. And remember, the elements of the heap by invariant 1 are exactly the vertices on the right-hand side, the vertices of v - x. So were trying to find the key value for some vertex in the heap. So some vertex v, which is on the right side, which is not in x. And so, what we do is we look at the edges incident on this vertex v that go back to the left-hand side, so, edges incident to v that are crossing the frontier and there may be of course be many such edges. And the invariant we want to maintain is that the key value for this vertex V is the cheapest of all the incident edges crossing their frontier or in this picture the key should be equal to two. There is the niggling issue of how do you define the key if there are no incident edges at all that are crossing the frontier. So maybe you have a vertex w, which is buried deep inside of v - x, and actually, none of the incident edges go back to the left blob at all. So in that case we just define the key to be plus infinity. So given this high level approach to implementing Prim's algorithm using heaps, we now have a few things to think through. So first of all we have to think about how to initialize the heap so that these invariants are satisfied at the beginning of Prim's algorithm. Second of all, we have to check that if these invariants are satisfied, then we can faithfully simulate each iteration of the while-loop in Prim's algorithm, hopefully very quickly. And then third, we have to think about how do we make sure these invariants are maintained throughout the course of Prim's algorithm, so let's do those in turn. So the first thing is how do we set up the heap at the beginning of Prim's algorithm and a preprocessing step, so that both of these invariants are satisfied. Well, at the beginning, X consists just of this single arbitrary star vertex S. V minus X contains the other n - 1 vertices. The key value of a vertex other than S at the beginning of Prim's algorithm, is just the cheapest edge between that vertex and S if there is one, or plus infinity otherwise. So, the thing to think through and make sure you believe this, is that first of all, with a single scan through the edges, so an O of m time, we can compute the key value for each vertex that needs to go in the heap. And then, we just have to insert those n - 1 vertices into the heap. So that's going to cost us O of m time for an edge scan, and then, m log n for the inserts. In fact, for those of you that really know heaps very well, you might know about a heapify operation which allows you to do a batch of n inserts in O of n time because we can do this even faster in linear time but we're not going to need that in this lectures. And also, I claim that this expression m + n log n is bounded above by the expression m log n, at least an asymptotic notation. To see that, remember two things. First of all we're assuming that the input graph is connected, otherwise there's no spanning trees and it's not interesting to talk about minimum spanning trees. Second of all, in any connected graph, the number of edges m is at least n - 1. So asymptotically, m is always at least as big as n and it can be bigger. So you can always replace an n by an m and get a valid upper bound, so that's what we're doing here. The second issue we need to think through is how do we faithfully simulates each iteration of the while loop in Prim's Algorithm, given that these two invariants halt. So this issue is going to work out beautifully really, by construction. We set up our heap and we set up our definition of keys, so that extracting min from the heap and iteration is a faithful simulation of the brute-force search in the naive implementation of Prim's alogrithm. So specifically, assuming that these two invariants hold when we invoke EXTRACT-MIN from this heap, what it provides to us on a silver platter is the next vertex, not yet in X. The next vertex that we should add to X in this iteration. And moreover, the cheapest edge incident to that vertex crossing the frontier is the one that we should be adding to the set T in this iteration. And the way to think about this fact is to think of us as essentially simulating the brute-force search and the naive implementation using a 2-round knockout tournament. So, in the straightforward implementation of Prim, the way we think of it is we just do a scan through all the edges crossing the frontier and we remember the winner, we remember the smallest cost of them all. Here, with a heap, we're doing it in two steps. So first of all, for each vertex on the right-hand side of the cut, for each vertex in v - x, it locally remembers what is its best candidate so what is the cheapest edge incident on that vertex crossing the frontier. So that's kind of round one, so for an edge to be chosen as the winner, at the very least, it'd better be a local winner. It'd better be the cheapest edge crossing the cut that ends at this particular vertex on the right-hand side of the cut. So that's just in a definition of the key of each vertex and encodes the value of the winner localed in that vertex. And then this EXTRACT-MIN is envoking the second round of this 2, 2-round elimination tournament. It's saying, well, amongst all the proposals from the 1st round, amongst all the crossing edges that are locally minimum given it's endpoint, which of them is the cheapest overall? And that's going to be the cheapest edge crossing this cut, the result of this exact min computation.