In this capstone project we’ll combine all of the skills from all four specialization courses to do something really fun: analyze social networks!
The opportunities for learning are practically endless in a social network. Who are the “influential” members of the network? What are the sub-communities in the network? Who is connected to whom, and by how many links? These are just some of the questions you can explore in this project.
We will provide you with a real-world data set and some infrastructure for getting started, as well as some warm up tasks and basic project requirements, but then it’ll be up to you where you want to take the project. If you’re running short on ideas, we’ll have several suggested directions that can help get your creativity and imagination going. Finally, to integrate the skills you acquired in course 4 (and to show off your project!) you will be asked to create a video showcase of your final product.

Рецензии

CR

Cannot express how grateful i am to this course series. Will recommend to anyone who has needs!

SS

Apr 07, 2016

Filled StarFilled StarFilled StarFilled StarFilled Star

I really like the fact that we get to decide the capstone and we get to showcase our project.

Из урока

Introduction and Warm up

Welcome to our capstone project! In the last four courses in this specialization you've learned many core data structures and algorithms, and applied them to three different real-world projects. In this capstone project you'll be doing a project very much like the projects from these other courses, only it will be almost entirely directed by you! In this first week you'll get warmed up by playing around with the data that will form the backbone of this project: social network data. Then you'll get back into writing code by implementing a couple of graph algorithms to answer questions about this data.

Преподаватели

Christine Alvarado

Mia Minnes

Leo Porter

Текст видео

All right, so let's move onto that algorithm. Now that you have this concept of what a strongly connected component is, how do we find them? Well, it's really cool because there was often an algorithm in which you can just basically apply depth first search twice and magically, the depth first to the strongly connected components will pop out of that algorithm. So essentially, you apply depth first search once on the graph as it is, and then you take the transpose of the graph, the transpose I'll talk about in a little bit but basically you just reverse all the edges in the graph. And then do depth first search again in a particular order, and you'll get the strongly connected components at the end of that algorithm. So let's be a bit more formal. Here's an overview that algorithm. So step 1, as I mentioned, you do a depth first search on this graph, keeping track of the order in which the nodes finish in the search. And we'll be more precise about that in just a moment. Then you compute the transpose of the graph. Then you do depth first search again, but the order in which you explore your nodes is in reverse order of the order in which they've finished in step 1. All right, so let's take a look at the algorithm in a bit more detail. We're going to dive into step 1. So, step 1 says perform a depth first search on the whole graph. Now this is a very similar depth first search to what we did in our previous courses, except for our goal here is not to find a path from a start to a goal. It's actually just to visit all the nodes in the graph and keep track of the order in which we finish exploring from each particular node. So we're dividing this approach up into two different methods. So we have our DFS method at the top, and what that's doing is it's just iterating over all the nodes in the graph and visiting each one with this DFS-VISIT, which is this algorithm I'll talk about in a second, if it hasn't yet been visited. And the reason we need this is because if the nodes aren't reachable from that first node where you start, your DFS-VISIT won't actually reach all the nodes in the graph so you might need to iterate over the remaining nodes to visit them in this DFS algorithm. And you'll see how that works in just a second. Ang then the DFS-VISIT is really the core. That's where most of the work is doing with the actual depth first search. It's using recursion, don't be too alarmed by that. I'll talk a little bit about how recursion works as we get into this algorithm in a little more detail. But basically, all you need to know is that when you see that call to DFS-VISIT from the middle of DFS-VISIT, it just goes back to the top of the function as if it were any other function. All right, let me just dive in and show you how it works on this particular example. So importantly, this is going to maintain a couple of data structures. So DFS is going to have a stack of vertices that's going to determine the order in which those vertices are explored, visited in that main DFS loop. And then it's going to have a finished stack. So as we finish each vertex, as we completely finish exploring from that vertex, we're going to put it onto this stack called the finished stack, and you'll see a line there down at the end of the DFS-VISIT that adds nodes to the stack. All right, so let's get started. The first thing we do is, while this vertices stack is not empty, we're going to take the first node off of that stack, and in this case the first node is 18. So, we took it off the stack, this is the node that we're going to now do DFS-VISIT on because it hasn't yet been visited. So, call DFS-VISIT, jump down into DFS-VISIT. We're going to visit it. So, ordinarily, we might add it to some set a visited notes, but I'll just turn it grey for the purpose of this example. So 18 is now grey, it's been visited. But now we're going to do our dept first search from node 18. So we're going to find its neighbors, there they are, and for each of those neighbors, if it already hasn't been visited, it's going to call DFS-VISIT recursively on that neighbor. So let's take the neighbor, for example, 23. And so 23 is going to go back into DFS-VISIT as now the new argument v. So now we're doing DFS-VISIT from node 23. The first thing we do is we mark it as visited, and then we're going to get each of its neighbors. So, it has two neighbors, 25 and 18. And the first thing it's going to do is notice that 18 has already been visited. So, it's going to skip that one and it's going to go on to visit 25. So it's going to now go into that loop. It's going to call DFS-VISIT again, but this time using 25. It's going to mark 25 as visited, and then again go to that loop. So it's going to find a neighbor of 25, go visit it. And now we're in this situation where we're looking at 65. Now 65 doesn't have any unvisited neighbors. So 65 actually is not going to do anything in that for loop. It's going to skip all the way to the end, and finally end up pushing 65 onto that finished stack. So 65 is the first node that we've actually finished with. All these other nodes are in some sort of pending state, where we're not really sure if that loop is finished, if there are any more neighbors to explore. But 65, we concluded that, nope, we're out of neighbors, we're going to add it to the finished stack and we're done with 65 forever. All right, now that we're done with 65, we have to go back from where we came from. So how do we get to 65 when we're in the middle of exploring neighbors from 25? So we go back to 25 and check, does 25 have any more unvisited neighbors? No, it doesn't, so 25 is now finished. So we're going to take 25, put in onto the finished stack on top of 65, so 25 finished after 65. All right, how did we get to 25? Well, we got there from exploring from 23. So does 23 have any more neighbors? No it doesn't, so we'll finish off 23, put it on the finished stack. How did we get to 23? We came from 18. Well, we're in the middle of exploring 18's neighbors, and 18 does have another neighbor, 44. So now we're going to go explore from 44. We're going to do DFS-VISIT on 44, mark it as visited, and then explore its neighbors. It has one neighbor, 50, so let's go explore it. We're going to mark it as visited, doesn't have any more neighbors, so it gets on the finished stack. We go back to 44, we're done exploring from there, it goes on the finished stack. And finally, we go back to 18, we don't have any more neighbors to explore, it goes on the finished stack. So that was one call starting at 18 from DFS to DFS-VISIT. So now DFS-VISIT on 18 is completely done and we return back to DFS. So now what are we going to do in DFS? Well, we've got this loop going that's looping through all the vertices in the graph. So now we just need to continue our loop. We need to pop vertices off of that vertices stack and determine if they've been visited already. Well, 23's been visited, so we don't do anything with 23. 25 has been visited, so we don't do anything with 25. But 32 now has not been visited, so we have to do DFS-VISIT on 32. So, let's look at how this works. We are going to visit 32, do DFS-VISIT. Well, it's going to be visited, but it doesn't have any unexplored neighbors. And so we'll just finish it off, add it to the finished stack and at this point we're done. We'll just go ahead and pop all of those notes off of the vertices stack inside DFS and that's it. This run of DFS is finished, and we have our finished stack. This is the reverse order, so from top to bottom, it's the reverse order of how those nodes finished as we are doing our DFS exploration. Okay, so that's step 1. Let's now go on to step 2. Luckily, step 2 is a lot more straightforward. It says, all I need to do now is compute the transpose of the graph. With the transpose, it's just the same graph with the edges flipped in the other direction. So let's flip all the edges. There we go, the edges are flipped and we're done with step 2. Now onto step 3, were almost there. Step 3 is actually the same as step 1, only we're working now with the transpose of the graph. And we're going to pass in that finished stack from step 1 as the vertices stack, the order in which we're going to explore in step 3. So let's take a look at how that works. So here's my vertices stack. Notice it's the finished stack from step 1. That's the order in which we're going to explore it, in reverse order from how we finished in step 1. And every time we start a DFS-VISIT in this case, it's going to form the root of a tree which is going to be a strongly connected component. So I'm going to list the strongly connected components. Down there at the bottom, you can see that 32 is now starting to build a strongly connected component. All right, so 32 is a strongly connected component, let's do the DFS-VISIT on it. Well, it doesn't go anywhere because there's no outgoing edges. So we can jump right to 18. 18 is now its own strongly connected component because it's a separate call to DFS-VISIT from DFS, and now we need to do DFS-VISIT on 18. So when we explore out, I won't trace through the algorithm in great detail again, but it's enough to notice that what you're going to find is you're going to find all the nodes 25, 23, and 65, and those are all going to be found from our DFS-VISIT from node 18. Okay, we've got a few more to do. So 44, we take 44 off of our vertex stack and it's its own strongly connected component. We can't go anywhere from 44 that hasn't already been visited, so it just goes into its strongly connected component. And then, finally, 50 is again its own strongly connected component because there's nowhere to go that hasn't already been visited. And that's it, those are the strongly connected components in this graph. Now it's just a matter of getting rid of the other vertices which have already been visited and so we just simply ignore them. Okay, so that's the algorithm. That's all you need to do and that will find the strongly connected components in this graph. It's really cool. The intuition behind it is not actually that complicated. The idea is that by going into this reverse order, you're essentially saying that this is the root of the tree in our first exploration. That's going to be explored first. And if we reverse all the edges, then we're finding the nodes that can also reach it. So that's a very handwavy intuition about why this algorithm works but it really does work. And that's the algorithm that we recommend that you implement to find the strongly connected components in this graph.