Friday, April 27, 2012

Basic graph analytics using igraph

Social Network Site such as Facebook, Twitter becomes are integral part of people's life in. People interact with each other in different form of activities and a lot of information has been captured in the social network. Mining such a network can reveal some very useful information that can help an organization to gain competitive advantages.

I recently come across a powerful tools called igraph that provides some very powerful graph mining capabilities. Following are some interesting things that I have found.

Create a Graph

Graph is composed of Nodes and Edges, both of them can be attached with a set of properties (name/value pairs). Furthermore, edges can be directed or undirected and weights can be attached to it.

iGraph also provides 2 graph generation mechanism. "Random graph" is to generate an edge randomly between any two nodes. "Preferential attachment" is to assign a higher probably to create an edge to an existing node which has a high in-degree already (the rich gets richer model).

Connected Component algorithms is to find the island of nodes that are interconnected with each other, in other words, one can traverse from one node to another one via a path. Notice that connectivity is symmetric in undirected graph, it is not the necessary the case for directed graph (ie: it is possible that nodeA can reach nodeB, then nodeB cannot reach nodeA). Therefore in directed graph, there is a concept of "strong" connectivity which means both nodes are considered connected only when it is reachable in both direction. A "weak" connectivity means nodes are connected

Shortest Path is almost the most commonly used algorithm in many scenarios, it aims to find the shortest path from nodeA to nodeB. In iGraph, it use "breath-first search" if the graph is unweighted (ie: weight is 1) and use Dijkstra's algo if the weights are positive, otherwise it will use Bellman-Ford's algorithm for negatively weighted edges.

From his studies, Drew Conway has found that people with low Eigenvector centrality but high Betweenness centrality are important gate keepers, while people with high Eigenvector centrality but low Betweenness centrality has direct contact to important persons. So lets plot Eigenvector centrality against Betweenness centrality.

Thanks. Very insightful post. "Drew Conway has found that people with low Eigenvector centrality but high Betweenness centrality are important gate keepers, while people with high Eigenvector centrality but low Betweenness centrality has direct contact to important persons."

This got confirmed to me when I had a node with all edges going to it but no edges going out of it... hence the eigenvector centrality was 1 and the betweeness was 0 :-)

Hi,i have a problem, i want to plot the degree distribution of my adjacency matrix; i already coerced in object "graph" in order to work with "igraph". The matrix is correct, the graph (g) is undirected, with 88 nodes and 227 edges. I want just the degree distribution and its "plotting" but the function "degree.distribution(g)" return "NULL"..it doesn't work, neither the "plot" what can i do? i'm not an expertthanks