An Overview of Graph Algorithms in Neo4j

In my previous blog, we saw about neo4j with some basic example. In this post, let’s see an overview graph algorithms in neo4j.

Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.

Centralities

These algorithms determine the importance of distinct nodes in a network:

PageRank – Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis. PageRank is also used in machine learning to identify the most influential features for extraction.

Betweenness Centrality – Measures the number of shortest paths (first found with breadth-first search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information. Betweenness centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.

Closeness Centrality – Measures how central a node is to all its neighbours within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest. Closeness centrality is applicable in a number of resources, communication, and behavioural analysis, especially when interaction speed is significant. It has been used for identifying the best location of new public services for maximum accessibility. In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.

Community detection

These algorithms evaluate how a group is clustered or partitioned:

Louvain – Measures the quality (i.e. presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It’s often used to evaluate the organization of complex networks and community hierarchies in particular. It’s also useful for initial data preprocessing in unsupervised machine learning. Louvain is used to evaluate social structures on Twitter, LinkedIn, and YouTube. It’s used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average.

Label Propagation – Spreads labels based on neighbourhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It’s a key method for understanding the organisation of a graph and is often a primary step in other analysis. Label propagation has diverse applications, from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It’s also used in semi- and unsupervised machine learning as an initial pre-processing step.

(Weakly) Connected Components – Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups, and determine whether two nodes are in the same group.Union-find/connected components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.

Strongly Connected Components – Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It’s often applied from a depth-first search. Strongly connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.

Triangle Count / Clustering Coefficient – Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.

Path finding

These algorithms help find the shortest path or evaluate the availability and quality of routes:

Minimum Weight Spanning Tree – Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time, or capacity) associated with visiting all nodes in the tree. Minimum weight spanning tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs, and much more.

All Pairs Shortest Path – Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph. It’s commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal. All-pairs shortest path is used to evaluate alternate routes for situations, such as a freeway backup or network capacity. It’s also key in logical routing to offer multiple paths; for example, call routing alternatives.

Single Source Shortest Path – Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time, or capacity) to all other nodes is minimal. Single-source shortest path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps.

Share this:

Like this:

Related

2 Comments

Jan Zac
· June 10, 2018 at 10:49 pm

I saw your tweets and thought I will check your website. Have to say it looks very good!
I’m also interested in this topic and have recently started my journey as young entrepreneur.

I’m also looking for the ways on how to promote my website. I have tried AdSense and Facebok Ads, however it is getting very expensive.
Can you recommend something what works best for you?

Would appreciate, if you can have a quick look at my website and give me an advice what I should improve: http://janzac.com/
(Recently I have added a new page about FutureNet and the way how users can make money on this social networking portal.)

I have subscribed to your newsletter. 🙂

Hope to hear from you soon.

P.S.
Maybe I will add link to your website on my website and you will add link to my website on your website? It will improve SEO of our websites, right? What do you think?

Social Profiles

Related Posts

We know that we can store different data types in a database and can query it. In this post, we will see about storing and querying geo-location data type. There are different data stores like elasticsearch, druid, MongoDB which Read more…

Graph Analytics: Data is great, but true insight comes from examining relationships between data points. Hiding somewhere in your data are patterns of relationships representing money on left on the table – if only they Read more…