We are going to follow their recipe, but we are going to add a little spice. Instead of creating a small 2 node, 1 relationship graph, I am going to show you how to leverage the power of Gremlin and Groovy to build a much larger graph from a set of files.

Let’s start by cloning the Neoflix Sinatra application, and instead of installing and starting Neo4j locally, we are going to create a Heroku application, and add Neo4j.

The xs and ys are our username and password. We can use the address given in NEO4J_URL to take a look at the server. For part two, it would be wise to keep an eye on the “dashboard” as we create new nodes and relationships. The Neoflix project layout:

neoflix.rb
public/movies.dat
public/users.dat
public/ratings.dat

Let’s take a look at the source code in neoflix.rb: We require our gems and use the NEO4J_URL variable to tell Neography how to reach the Neo4j server.

Since we wiped everything clean, we setup automatic Indexing on all vertices and all properties.

if neo.execute_script("g.indices;").empty?
neo.execute_script("g.createAutomaticIndex('vertices', Vertex.class, null);")
end

We are going to create a lot of data, so we set our graph to commit every 1000 changes in an automatic transaction.

g.setMaxBufferSize(1000);

Here comes some magic. We do not have access to the file system of the server running our Neo4j instance but since we have the full power of Groovy at our disposal, we simply grab the file from Sinatra instead. Anything you put in the public directory will be automatically served for you. The fields of movies.dat are delimited by “::” and the generas are delimited by “|”.

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance

So for each line in our file, we are going to create a movie vertex, and link it to one or more generas. We are sending this Gremlin script inside a Ruby String, so we must escape the escape slashes which escape the | in the final script. As we go along, we are also creating vertices for the generas if they don’t already exist.

If you are a Rubyist, you should be able to read that Groovy code, but let me point out a few things. In Groovy variable definitions it is mandatory to either provide a type name explicitly or to use “def” in replacement.

And this funky piece of code is an unfortunate escape of the pipe character by a backslash which also needs to be escaped, which are both in our Ruby String and must also be escaped.

components[2].split('\\\\|').each { def genera ->

This next bit of code looks up the genera in our index, and if it doesn’t exist, it creates it.

This Hash inside an Array inside an Array looking construct is Gremlins way of querying the index. We are telling it to return a node if it has a property genera that matches the genera variable we parsed after splitting the components[2] field.

g.idx(Tokens.T.v)[[genera:genera]].iterator();

We do this a few more times to load the users and ratings into our graph and end with this:

g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);")

Which commits any left over items in our transaction buffer.

In Part Two, we’ll bring up our Heroku app, load the data, possibly add Movie Posters from a third party API, and visualize some of the implicit relationships in the graph as outlined in the original blog post… and I’ll probably do a part Three which will use the fresh off the presses CSV File Importer and reload the graph with a bigger set of movie data using Heroku. In between however I think it’s time we looked at Neo4j Spatial. You’ll know when new posts are published by following me on Twitter.