Want to understand your data network structure and how it changes under different conditions? Curious to know how to identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model, store, retrieve and analyze graph-structured data.
After completing this course, you will be able to model a problem into a graph database and perform analytical tasks over the graph in a scalable manner. Better yet, you will be able to apply these techniques to understand the significance of your data sets for your own projects.

Revisiones

JA

I found a new love in this course Neo4j. Graphs are really powerful. You should expect a very intensive theoretical and hands-on knowledge to takeaway from this course. Think like a vertex ....

AV

Jul 17, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

The Course was amazing. It was very much helpful in getting the insights and developed my skills in analyzing the data using Graphs. Thank You Coursera and UC, San Diego Team.

De la lección

Graph Analytics Techniques

Welcome to the 4th module in the Graph Analytics course. Last week, we got a glimpse of a number of graph properties and why they are important. This week we will use those properties for analyzing graphs using a free and powerful graph analytics tool called Neo4j. We will demonstrate how to use Cypher, the query language of Neo4j, to perform a wide range of analyses on a variety of graph networks.

Impartido por:

Amarnath Gupta

Transcripción

Next, we will talk about Path Analytics using Cypher, the query language for Neo4j. Here's the listing of the various queries we will be demonstrating. There are some things that Cypher is capable of doing very well. And there are other things that require a little bit of creativity in order to get the results that you're looking for. And we'll show you some examples of that. It's also important to keep in mind that because we're working with paths, which are an official structure in graph networks, each one of these examples includes a new variable. Which in this case, we're using the letter p to represent for the actual path objects that we're going to be returning. You may also see the complete word path instead of just the single letter p to represent these objects. We're going to continue to use the data set of a simple road network that we've already been using in previous demonstrations that contains 11 nodes and 14 edges. So the first query we're going to demonstrate is finding a path between specific nodes. So this would be very much like trying to find a root between two different locations in our road network. In this case, we're going to find a path between the node named H and the node named P. To do this, we'll use the match command and we'll say match p which is a variable we're using to represent our path, = node a, going through an edge to node c. There's something slightly different about this edge, and that is that we're using a star to represent an arbitrary number of edges in sequence between a and c, and we'll be returning all of those edges that are necessary to complete the path. And in this case we only want to return a single path. So when we submit this query, we see this path. It consists of eight nodes and seven edges. And it begins with H and ends with P. Now, another common function we will use frequently with paths is finding the length between two specific nodes. So we'll issue the same two lines of code and then we'll use this new command, length, to return an actual value. We want to be returning an actual path. And we just want a single value. And when we submit this query, we get the result seven. And we can see that by visually inspecting the graph or our seven edges. But because most networks are much more complex than this, we would need to understand the necessary query to return the length. And ideally, in the case of our road network, we would want to find the shortest path between those two nodes. So in this case we're introducing yet another new command specific to paths called shortestPath. We will use the same variable key, and the same descriptions in our syntax, in connecting node a with node c. And in this case, were going to look for the shortest path between node a and node p, and we're going to return that path as well as the length of that path. And we're just going to return a single path. And when we submit this query, we get a path that's five nodes and four edges long and if we look at the text results that are returned, we'll see a length displayed in the length column. And that value is 4, and we can see that by visually inspecting our graph. The next query we are going to demonstrate is intended to illustrate that there may be more than one shortest path. And so, we may want to know all of the possible shortest paths in order to make a choice between which one we prefer. So we'll be using a command that is built into Neo4j called, allShortestPaths. We'll be issuing a similar query to what we issued previously, we're going to try to find all of the shortest paths between node a and node p. And instead of the letters a and c, we're using the terms source and destination. But the results that we're going to return will actually be in the form of an array. We're using a new term, extract, which is based on the following. Assuming we have matched our path p, we want to identify all of the nodes in p and extract their names. And we'll return these names as a listing, which we'll call the variable paths. If there's more than one shortest path, we'll get multiple listings of node names. So when we submit this query, the results are listed in the rows display and we see there are actually two shortest paths. They each have five nodes and four edges. Now, we may want to issue a query that finds the shortest path but with particular constraints or conditions that we place on them. So in this case we still want to find the shortest path, but in this case we may want to constrain the path length to be greater than a particular value, in this case 5. And then, we want to return essentially the same results that we returned in the previous query. But we'll also want to return the length of the resulting path just so we have that information conveniently. So when we issue this command we get a path with length six between node A and node P. So it's clearly longer than the shortest path that we had found earlier. Now that we are somewhat familiar with the two shortest path commands, the shortest path, or a single path and the all shortest paths command or multiple shortest paths, we're going to use that in a little bit of a creative way to return the diameter of the graph. And if you remember from a previous lecture, the definition of the diameter of the graph is actually the longest continuous path between two nodes in the graph. So by using the shortest path command, but returning all possible shortest paths, we're actually going to get the longest path included in those results returned. Now, if we look carefully at this script, it is a little different than our previous scripts. In this case our match command is matching all nodes of type MyNode. We'll assign those to the variable end. We're also matching the all nodes of type MyNode and assigning that to variable m. So these matches are the same. But we want to place a constraint such that the nodes in n are not the same as the nodes in m, and then we want to find all of the shortest paths between unique nodes in n and m. And return the names of those nodes as well as the length of that resulting path. And the trick is to use the command order by. And so for those of you who are familiar already with SQL query language, you'll recognize order by. You'll also recognize the descend command. So if we order the resulting paths by their length in descending order, and only return 1, that path should actually be the longest path. And that's equal to the diameter of the graph. So when we submit this query, here's the results that we get. We get a path between node e and node l with length severn. Or maybe it occurs to you that maybe this is not the only diameter of the graph, the only path with length of seven. So we can modify our query just a little bit and change the limit from one to five. And we'll see the results. And sure enough, we actually get five paths. And 3 of those have length 7. So there are actually three distinct paths which qualify as a diameter of this particular graph.