Why You Should Use Neo4j in Your Next Ruby App

I have needed to store a lot of data in my time and I’ve used a lot of the big contenders: PostgreSQL, MySQL, SQLite, Redis, and MongoDB. While I’ve built up extensive experience with these tools, I wouldn’t say that any of them have ever made the task fun. I fell in love with Ruby because it was fun and because it let me do more powerful things by not getting in my way. While I didn’t realize it, the usual suspects of data persistence were getting in my way. But I’ve found a new love: let me tell you about Neo4j.

What is Neo4j?

Neo4j is a graph database! That means that it is optimized for managing and querying connections (relationships) between entities (nodes) as opposed to something like a relational database which uses tables.

Why is this great? Imagine a world with no foreign keys. Each entity in your database can have many relationships referring directly to other entities. If you want to explore the relationships there are no table or index scans, just a few connections to follow. This matches up well with the typical object model. It is more powerful, though, because Neo4j, while providing a lot of the database functionality that we expect, gives us tools to query for complex patterns in our data.

Introducing ActiveNode

To connect to Neo4j we’ll be using the neo4j gem. You can find instructions for connecting to Neo4j in your Rails application in the gem’s documentation. Also the app with the code shown below is available as a running Rails app in this GitHub repository (use the sitepoint Git branch). When you’ve got your database up and running use the rake load_sample_data command to populate your database.

Here is a basic example of an Asset model from an asset management Rails app:

Creating Recommendations

So what just happened? ActiveNode generated a query to the database which specified a path from our asset to all other assets which share a category. The database then returned just those assets to us. Here’s the query that it used:

This is a query language called Cypher, which is Neo4j’s equivalent to SQL. Note particularly the ASCII art style of parentheses surrounding node definitions and arrows representing relationships. This Cypher query is a bit more verbose because ActiveNode generated it algorithmically. If a human were to write the query it would look something like:

I find Cypher easier and more powerful than SQL, but we won’t worry too much about Cypher in this article. If you want to learn more later you can find great tutorials and a thorough refcard.

As you can see, we can use Neo4j to span across our entities. Big deal! We can also do this in SQL with a couple of JOINS. While Cypher seems cool, we’re not breaking any major ground yet. What if we wanted to use this query to make some asset recommendations based on shared categories? We’ll want to sort the assets to rank those with the most categories in common. Let’s create a method on our model:

We are defining variables as part of our chain to use later (c and asset).

We are using the Cypher collect function to give us a result column containing an array of the shared categories (see the table below). Also note that we are getting full objects, not just columns/properties:

asset

collect(c)

count(c)

#<Asset>

[#<Category>]

1

#<Asset>

[#<Category>, #<Category>, …]

4

#<Asset>

[#<Category>, #<Category>]

2

…

…

…

Did you notice that there is not a GROUP BY clause? Neo4j is smart enough to realize that collect and count are aggregation functions and it groups by the non-aggregation columns in our result (in this case that’s just the asset variable).

Take that SQL!

As a last step we can make recommendations on more than just categories in common. Image that we have the following sub-graph in Neo4j:

In addition to shared categories, let’s account for how many creators and viewers assets have in common:

Here we delve deeper and start forming our own query. The structure is the same but, rather than finding just one path between two assets via a shared category, we also specify two more optional paths. We could make all three paths optional, but then Neo4j would need to compare our asset with every other asset in the database. By using a match rather than an optional_match for our path through Category nodes we require that there be at least one shared category. This vastly limits our search space.

In the diagram there is one shared category, zero shared creators, and two shared viewers. This means that the score between “Ruby” and “Ruby on Rails” would be:

(1 * 2) + (0 * 4) + (2 * 0.1) = 2.2

Also note that we’re doing a calculation (and sorting) on a count aggregation of these three paths. That’s so cool to me that it makes me tingle a little to think about it…

Easy Authorization

Let’s tackle another common problem. Suppose your CEO comes by your desk and says “We’ve built a great app, but customers want to be able to control who can see their stuff. Could you build in some privacy controls?” It seems simple enough. Let’s just throw on a flag to allow for private assets:

With this you can display all of the assets which a user can see either because the asset is public or because the viewer owns it. No problem, but again not a big deal. In another database you could just do a query on two columns/properties. Let’s get a bit crazier!

The Product Manager comes to you and says “Hey, thanks for that, but now people want to be able to give other users direct access to their private stuff”. No problem! You can build a UI to let users add and remove VIEWABLE_BY relationships for their assets and then query them like so:

That would have been a join table otherwise. Here you just throw in another path by which users can have access to an asset. You take a moment to appreciate Neo4j’s schemaless nature.

Satisfied with your days’ work you lean back in your chair and sip your afternoon coffee. Of course, that’s when the Social Media Customer Care Representative drops by to say “Users love the new feature, but they want to be able to create groups and assign access to groups. Can you do that? Oh, also, could you allow for an arbitrary hierarchy of groups?” You stare deeply into their eyes for a few minutes before responding: “Sure!”. Since this is starting to get complicated, let’s look at an example:

If both of the assets are private your code so far gives Matz and tenderlove access to Ruby and DHH access to the Ruby on Rails. To add group support you start by following directly assigned groups:

That was pretty easy, since you just needed to add another path. It’s two hops, sure, but that’s old hat for us by now. Tenderlove and Yehuda will be able to see the “Ruby on Rails” asset because they are members of the “Railsists” group. Also note: now that some users have multiple paths to an asset (like Matz to Ruby via the Rubyists group and via the CREATED relationship) you need to return DISTINCT asset.

Specifying an arbitrary path through a hierarchy of groups takes you a bit more time, though. You look through the Neo4j documentation until you find something called “variable relationships” and give it a shot:

Here you’ve done it! This query will find assets accessible to a group and traverse any set of zero to fiveHAS_SUBGROUP relationships, finally ending on a check to see if the user is in the last group. You’re the hero of the story and your company showers you with bonuses for getting the job done so quickly!

Conclusion

There are many awesome things that you can do with Neo4j (including using it’s amazing web interface to explore your data with Cypher) which I’m not able to cover. Not only is it a great way to store your data in an easy and intuitive way, it provides a lot of benefits for efficient querying of highly connected data (and believe me your data is highly connected, even if you don’t realize it). I encourage you to check out Neo4j and give it a try for your next project!

Brian Underwood is a developer advocate for Neo4j and one of the maintainers of the neo4j.rb project. He is currently traveling the world with his wife and three year old son. You can find him as cheerfulstoic on GitHub, Twitter, Google+, or his website.