Visualizing your LinkedIn graph

A few years ago, LinkedIn released a nice product call "InMap" where one could visualize their connections graph. This feature has been removed since then but,
as it was great, we are going to try to reproduce it (or at least a simplified version) in DSS.

Before you start

Make sure you have an functional Internet connection since we are going to get data from an external API.

In your DSS instance, create a new project, called "LinkedIn" for instance.

Getting your LinkedIn credentials

As we are going to make use of the LinkedIn REST API, you'll need first to get proper credentials to be able to make calls against it.
The LinkedIn API requires to be authenticated using OAuth 2, so you'll need to follow the instructions on this LinkedIn page to get an access token.

Once your credentials retrieved, as you'll make use of them several times through the project, you may want to store them as custom variables in DSS. Go to the Administration panel from the top bar,
and under Settings, click on Variables.

Click on "Run", and wait for your job to complete (note this may take long as the recipe will make call to an external API). Once finished, you'll have a dataset storing the list
of your first degree connections with 9 columns:

Getting relationships between your connections

Once your first degree connections fetched, the most difficult part is to get the actual relationships between them. The LinkedIn API provides us with a
somewhat hackish way to get these data.

Go back to your Flow screen, and create again a Python recipe, but this time taking the "first_degree_connections" dataset as input, and outputting a new dataset that
we'll call "related_connections".

Load the required libraries, the DSS dataset into a Pandas dataframe, and create the OAuth client:

defmake_url(user_id,start):# The actual API call to get relationships between your connections, paginateda="https://api.linkedin.com/v1/people/%s"%user_idb=":(relation-to-viewer:(related-connections))"c="?format=json&count=20&start=%s"%starturl=a+b+creturnurldefget_count(user_id):# Get the total number of connections for a given user idurl=make_url(user_id,0)resp,content=client.request(url)rels=json.loads(content)cond1=rels.has_key('relationToViewer')cond2=rels['relationToViewer'].has_key('relatedConnections')ifcond1andcond2:total=rels['relationToViewer']['relatedConnections']['_total']returntotaldefget_data(from_url):# Retrieves the list of related connections for a user idresp,content=client.request(from_url)rels=json.loads(content)res=pd.DataFrame(rels['relationToViewer']['relatedConnections']['values'])returnresdefget_user_data(user_id):# Looping through the pages to get all the resultsresults=pd.DataFrame()resultCount=get_count(user_id)offset=0count=20whileoffset<resultCount:url=make_url(user_id,offset)data=get_data(url)data['from_id']=user_idresults=pd.concat((results,data),axis=0)offset+=countreturnresults

Loop through your connections to get their related connections

m=0connections=pd.DataFrame()forc,user_idinenumerate(df['id']):# Just a bit of progress trackingifc%10==0:m=m+1print"[+] Done %i..."%(10*m)try:df=get_user_data(user_id)connections=pd.concat((connections,df),axis=0)except:print"No data for %s"%user_id

You'll need to wait for a fair amount of time while the recipe runs, especially if you have a large number of connections. Hopefully, you will end up with the edges of your graph:

We are interested in the "from_id" - "to_id" pairs. These are the actual relationships between your connections, the ones we'll use to build the graph.

Building the Graph dataset

It will now be pretty straightforward to build the final dataset supporting our graph visualisation application. This is basically the set of relationships (edges) between
you and your connections, or among your connections (the nodes, represented by both an ID and a label)

Start with creating a visual data preparation recipe on your "first_degree_connections" that will add create the graph of your direct relationships
(which is simply done by creating an edge between you and first degree connections):

Create a similar data structure with your "related_connections" datasets:

Finally, concatenate these 2 datasets using a Stack recipe:

Your final workflow migh look this:

Build the "graph" dataset. You now have the complete list of nodes and edges of your LinkedIn graph, ready to be visualized.

Visualizing your LinkedIn graph

Even if you may want to go for a solution like Gephi to create nice visualization of your graph, you can also create a
custom webapp directly hosted in DSS, hence sharable with other people.

Under Insights, create a new Python-enabled webapp. Also, you'll want to make d3.js and Bootstrap available to the webapp using the menu from the top right.
You can now fill the different components of the webapp:

The most complex part is the Javascript code. We'll make use of the famous D3.js library to build our graph, interacting with the Python backend:

// You'll need to change the APIKey and insight ID as well belowdataiku.setAPIKey('bhSzyIHEO7A5lnYLZA6onKprv4loxPts');dataiku.setDefaultProjectKey('LINKEDIN');$("#graph-container").empty();$.getJSON("/html-apps-backends/sjAeKN2/draw_graph",function(data){console.info(data);varwidth=900;varheight=900;varcolor=d3.scale.category20();varforce=d3.layout.force().linkDistance(25).charge(-65).size([width,height]);varsvg=d3.select("#graph-container").select("svg");if(svg.empty()){svg=d3.select("#graph-container").append("svg").attr("width",width).attr("height",height);};force.nodes(data.graph.nodes).links(data.graph.links).start();varlink=svg.selectAll(".link").data(force.links()).enter().append("path").attr("class","link");varnode=svg.selectAll(".node").data(force.nodes()).enter().append("g").attr("class","node").on("mouseover",mouseover).on("mouseout",mouseout).call(force.drag);node.append("circle").style("fill",function(d){returncolor(parseInt(d.community));}).attr("r",function(d){return4});node.append("text").attr("x",12).attr("dy",".35em");force.on("tick",function(){link.attr("d",function(d){vardx=d.target.x-d.source.x,dy=d.target.y-d.source.y,dr=Math.sqrt(dx*dx+dy*dy);return"M"+d.source.x+","+d.source.y+"A"+dr+","+dr+" 0 0,1 "+d.target.x+","+d.target.y;});node.attr("transform",function(d){return"translate("+d.x+","+d.y+")";});});functionmouseover(){d3.select(this).select("text").attr("x",12).attr("dy",".35em").style({'font-size':'16px'}).text(function(d){returnd.name});};functionmouseout(){d3.select(this).select("text").text(function(d){return""});};});

That's it. Save your Insights, publish to the Dashboard, and you can now see your LinkedIn graph in a webapp running in your browser, and hosted on DSS:

Conclusion

That's a wrap! Building a visualization of our LinkedIn graph is not an easy thing, but with this tutorial you shoud have the keys to reproduce it
in your DSS instance.

The webapp is pretty basic: you'll probably need to make it nicer and to fine tune the settings.
Using the node color (reflecting communities), and the overall graph layout (force), you should be able to pinpoint some of the main clusters if your relationships, just like I did:

Hope you enjoyed this tutorial. Feel free to get in touch with us if you have questions or comments, or if you want to understand how
DSS can be used to build other custom webapps.