After being annoyed for a long time about the Neo4j REST protocol performance I decided to have a look at streaming JSON last night. It seemed simple enough.

Today Peter pushed me to continue through and use the Lab day for finishing the lab-project.

So I started to create a server-extension project that does 2 things differently. First it uses a more compact format for the cypher results than the current restful representation. Secondly it uses streaming JSON to send a StreamingOutput into a Jersey-Response.

For a query like this it would return:

start n=node(*) match p=n-[r]-m return n as first,r as rel,m as second,m.name? as name,r.foo? as foo,ID(n) as id, p as path , NODES(p) as all

8 Comments

Hi Michael, very interesting, thanks for sharing. Nice to know we may have this support for large graphs over the wire.

One question (pardon me for the possible ignorance). Would it mean that we may see a long polling (comet like) one day on neo4j?

I mean, that would be nice to have a persistent connection with the server and receive events over it, I don’t know what’s behind the scenes with neo4j but combining this with netty for instance should be very interesting.

I just started researching solutions for a problem I have when I came across your blog entry. I have a long running process that will be building a graph, with nodes added at indeterminate intervals. I would like to have a web page that automatically, with no user interaction, updates whenever a new node is entered. Essentially I have a graph that is being built up slowly over time and that graph needs to be displayed as it is built. Will the streaming work you present here solve this problem? Does your streaming mechanism push updates from multiple transactions?

What you want is kind of an audit stream which is not really what the discussed streaming does.
But it should be pretty simple to add to Neo4j as a Server Extension which uses a transaction-event-handler, streaming JSON and a long connection timeout.

[...] If you are planning on passing a large amount of data to and from a Neo4j instance, it is definitely worth considering using the native Embedded Database in Java to achieve this instead of using their REST API. Although language independent, their REST API as of 1.7 is considerably slower by several orders of magnitude than their embedded database. However, their new streaming REST API for 1.8-SNAPSHOT at the moment plans on being considerably faster than the current API (though still slower than the embedded database, naturally) – those interested should check out an article by the developer on the project. [...]

We have a Neo4j 1.7.1 client and are looking into using Jackson to stream to implement client-side cypher streaming.

Our problem now is that we sometimes get a result from the Neo4j server that is very large and having in saved in one Java object is a problem.

The system is in production so the hope is to implement a quick client fix using the non-streamed Neo4j 1.7.1 server and later upgrade to 1.8 and make use of server-side streaming.

I see that the sample output for the streamed server has “rows” whereas our non-streamed server has “data”. I also see that the sample streamed server output has type:value pairs whereas our non-streamed server results has only values.

My question is:

Can a Jackson Cypher streaming client work correctly with a non-streamed Neo4j server?

if you have streaming in the client you can easily consume a result from a non-streaming server so that the client has less memory consumption (at least if you don’t keep the results around but just use them to calculate something).

In Neo4j Server 1.8 we implemented streaming for the format that was there before (aka colums:[], data: [[row1],[row2]] ). So you should use that format.