Pages

Thursday, 3 March 2016

The Neo4j Knowledge Graph

A couple of days ago, I wrote a graphgist about creating a true Knowledge Graph for the Neo4j ecosystem. Based on the fantastic Awesome Neo4j resource created by our friends at Neueda/Neueda4j. You can access it in a separate window over here.

In this post however, I will go into a bit more detail about how I went about creating that graph.

Google Spreadsheet is my friend

I mentioned already that I started from the awesome Awesome Neo4j github resource. And while it's a great idea to manage pages etc collaboratively on Github, I can't help but feel like there should be other and nicer ways of structuring that information. So I spent a couple of hours converting that information into a spreadsheet (which is publicly accessible over here):

This sheet contains

info about the resource (name and comments)

the URL where you can find the resource

info about the author (individual or organisation) that created/manages the resource

What this does is it looks at the "Tags" column of the spreadsheet (https://docs.google.com/spreadsheets/d/1X6DpFZoS01V1crgRED4dRz2UkbiYR8FJMPf9xey9Lwc) and it then created tags and relationships between tags and resources by iterating through the "tag cell" of the spreadsheet.

So for example, if a "tag cell" contains the following tags

code, rdbms, tool, integration, import

separated by columns, then the script above splits them up into individual tags (using the split(text,", ") command), then looks at the number of tags available (using size(words)-2) as an index to iterate over), and then merges the individual tags and the relationships.