Musings on software development, open source, PHP and current projects.

2011-06-15

Neo4j for PHP

Update 2011-09-14: I've put a lot of effort into Neo4jPHP since this post was written. It's pretty full-featured and covers almost 100% of the Neo4j REST interface. Anyone interested in playing with Neo4j from PHP should definitely check it out. I would love some feedback!

Lately, I've been playing around with the graph database Neo4j and its application to certain classes of problems. Graph databases are meant to solve problems in domains where data relationships can be multiple levels deep. For example, in a relational database, it's very easy to answer the question "Give me a list of all actors who have been in a movie with Kevin Bacon":

Excuse my use of sub-queries (re-write it to a JOIN in your head if you wish.)

But suppose you want to get the names of all the actors who have been in a movie with someone who has been in a movie with Kevin Bacon. Suddenly, you have yet another JOIN against the same table. Now add a third degree: someone who has been in a movie with someone who has been in a movie with someone who has been in a movie with Kevin Bacon. As you continue to add degrees, the query becomes increasingly unwieldy, harder to maintain, and less performant.

This is precisely the type of problem graph databases are meant to solve: finding paths between pieces of data that may be one or more relationships removed from each other. They solve it very elegantly by modeling domain objects as graph nodes and edges (relationships in graph db parlance) and then traversing the graph using well-known and efficient algorithms.

The above example can be very easily modeled this way: every actor is a node, every movie is a node, and every role is a relationship going from the actor to the movie they were in:

Now it becomes very easy to find a path from a given actor to Kevin Bacon.

Neo4j is an open source graph database, with both community and enterprise licensing structures. It supports transactions and can handle billions of nodes and relationships in a single instance. It was originally built to be embedded in Java applications, and most of the documentation and examples are evidence of that. Unfortunately, there is no native PHP wrapper for talking to Neo4j.

Luckily, Neo4j also has a built-in REST server and PHP is very good at consuming REST services. There's already a good Neo4j REST PHP library out there, but I decided to write my own to get a better understanding of how the REST interface actually works. You can grab it here and all the code examples below are written using it. The concepts can easily be ported to any Neo4j REST client.

First, we need to initialize a connection to the database. Since this is a REST interface, there is no persistent connection, and in fact, no data communication happens until the first time we need to read or write data:

Each node has `setProperty` and `getProperty` methods that allow storing arbitrary data on the node. No server communication happens until the `save()` call, which must be called for each node.

Linking an actor to a movie means setting up a relationship between them. In RDBMS terms, the relationship takes the place of a join table or a foreign key column. In the example, the relationship always starts with the actor pointing to the movie, and is tagged as an "acted in" type relationship:

The `relateTo` call returns a Relationship object, which is like a node in that it can have arbitrary properties stored on it. Each relationship is also saved to the database.

The direction of the relationship is totally arbitrary; paths can be found regardless of which direction a relationship points. You can use whichever semantics make sense for your problem domain. In the example above, it makes sense that an actor is "in" a movie, but it could just as easily be phrased that a movie "has" an actor. The same two nodes can have multiple relationships to each other, with different directions, types and properties.

The relationships are all set up, and now we are ready to find links between any actor in our system and Kevin Bacon. Note that the maximum length of the path is going to be 12 (6 degrees of separation multiplied by 2 nodes for each degree; an actor node and a movie node.)

`getRelationships` returns an array of all relationships that a node has, optionally limiting to only relationships of a given type. It is also possible to specify only incoming or outgoing relationships. We've set up our data so that all 'IN' relationships are from an actor to a movie, so we know that the end node of any 'IN' relationship is a movie.

There is more available in the REST interface, including node and relationship indexing, querying, and traversal (which allows more complicated path finding behaviors.) Transaction/batch operation support over REST is marked "experimental" for now. I'm hoping to add wrappers for more of this functionality soon. I'll also be posting more on the different types of problems that can be solved very elegantly with graphing databases.

The next post dives into a bit more detail about path-finding and some of the different algorithms available.

@Shay, I didn't do much (any) performance testing. I was more trying to get things working so I could play with the database. At some point it may make sense for someone to make a C extension of the REST client since there are no C client libraries for Neo4j, only Java.

You can't and you shouldn't. The node id is used internally by Neo4j. If your application needs to have ids that can be altered, you should assign your own ids to an indexed node property, and query using the index.

Here's one more question.I am creating node with one of the property "tags" and its values are in array of String and would want to index "tags". I tried doing that using code$tagIndex->add($node, "tags", $node->getProperty("tags"));but the string values of tags are not indexed separately.

I have been following Neo4j for the past one year and am quite impressed with the whole idea of graph database. We have been developing a product using Neo4j with Nodejs for quite sometime now. Hopefully, we will get it live soon.

Personally, I am working on another pet project (a simple mobile app). And for this one I wanted to use Neo4j with PHP (to create APIs for my app) as I am more comfortable programming with PHP than Node.js (which my fellow developers prefer). I started experimenting with the Neo4jPHP API a couple of days back and I have to admit that you have done a great job. The whole API is very easy to understand (especially since I have worked on the core JAVA library of Neo4j earlier and your API is structured in a very object oriented manner like the JAVA library).

Thanks a lot for creating Neo4jPHP and more thanks for providing great guidelines for using Neo4jPHP (including the API documentation)

hey there friends,i need to know how to add a value which is taken from a form in html as a parameter in setpropertyfunction....???i tried to extract $_POST and assign the created form variable.....but it doesn't work.....plz help

Thank for the Neo4J php wrapper. I have been writing tests to import our current database into graphdb format an using your wrapper. Its works great. But I have a question. I could create all the nodes, indexes and relations between them. But I cant understand how to prevent relationship duplication. I could iterate through all the relationships and check one by one, but its not a good solution.

I am trying to use the relationship index, but I don;t find any resource to use it. how can I add a relationship to the relationship index and how to query it? hat would be great and prevent me from duplicating relationships.

Documentation for creating and using indexes, including relationship indexes are available here: https://github.com/jadell/neo4jphp/wiki/Indexes

Also, if you are using Neo4j 2.0, you might want to look at the documentation on Cypher query language "CREATE UNIQUE" syntax. It will only create a relationship if one does not already exist. http://docs.neo4j.org/chunked/stable/query-create-unique.html

Thanks for the pointer Josh. Now I am getting segmentation faults with including the neo4jphp.phar file. I think it has to do with the Php version change. but not sure. still trying to get it work again.

Thank you! got everything working with Composer, with neo4j 2.0. Your php wrapper is working great, but I have one question. Currently I am trying to import 20 million rows from a sql database, but its too slow using create nodes & relations.

Are the Batch operations of the REST Api exposed in the php wrapper? That would be great for writing db import operations.

Hi Josh, the Batch operations are working fine, but I get memory limit error for larger batches, also once in a while without batch operations. I tried increasing the php memory_limit but for some reason it does not reflect in the error message, and the memory error always is the same.