Neo4j is an embedded, high performance and lightweight persistence solution based on the network database model that has recently been gaining a lot of interest:

Neo is a netbase — a network-oriented database — that is, an embedded, disk-based, fully transactional Java persistence engine that stores data structured in networks rather than in tables. A network (or graph, in mathematical lingo) is a flexible data structure that allows a more agile and rapid style of development.

You can think of Neo as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.

What makes Neo interesting is the use of the so called "network oriented database". In this model, domain data is expressed in a "node space" - a network of nodes, relationships and properties (key value pairs), compared to the relational model's tables, rows & columns. Relationships are first class objects and may also be annotated with properties, revealing the context in which nodes interact. The network model is well suited to problem domains that are naturally hierarchically organized, for example Semantic Web applications. The creators of Neo found that hierarchical and semi structured data is not well suited to the the traditional relational database model:

The object relational impedance mismatch makes it unnecessarily difficult and time consuming to squeeze an object oriented “round object' into a relational “square table.”

The static, rigid and inflexible nature of the relational model makes it difficult to evolve schemas in order to meet changing business requirements. For the same reasons, the database often holds a team back when they try to apply agile software development methodologies by rapidly evolving the object oriented layer.

The relational model is exceptionally poor at capturing semi structured data, a type of information that industry analysts and researchers alike agree will be the next “big thing” in information management.

A network is a very efficient data storage structure. It's not a coincidence that the human brain is one huge network or that the world wide web is structured as an adhoc network. The relational model can capture networkoriented data, but it is very weak when it comes to traversing that network in order to extract information.

While Neo is a relatively new open source project, it has been used in production applications with over 100 million nodes, relationships and properties, satisfying enterprise robustness and performance requirements:

full support for JTA and JTS, 2PC distributed ACID transactions, configurable isolation levels and battle tested transaction recovery. These aren't just words: Neo has been in production for more than three years in a highly demanding 24/7 environment. It is mature, robust and ready to be deployed.

The Java API consists of 12 classes. Creating a node is straightforward:

Neo4j has a dual licensing model: free software (GPL Style) and commercial (although no pricing information is available on the web page). Currently at version 1.0 beta 6, the next release of Neo4j is expected to be Release Candidate 1. Ruby and Python wrappers for Neo4j are also under development.

I'm one of the founders of the Neo4j.org community and Neo Technology -- the commercial entity backing it. Thanks for the post! We're a bit behind on our documentation (as usual), but it's great to see that people can still pick up the concepts as well as the code! We have one person working full time this summer just to improve community documentation, so if you start playing around with Neo4j then feel very welcome to join the mailing list and give feedback on what's good/bad/missing/etc with the documentation. It will be much appreciated!

Before we regress into licensing flame wars -- Neo4j is dual licensed with a viral open source license, much like for example Db4o and BerkeleyDB. We're using the Free Software Foundation's AGPLv3, which very simply put is an "extension" of the GPL that closes the so-called "ASP loophole." (For more info, google it or drop me a mail privately.)

I know there are many views on this, both legal-technical as well as ethical, and they flounder when a bunch of non-lawyers (like myself!) debate it. Suffice to say, our intent with Neo's licensing is this:

if you write free or open source software: great, we want you tobe able to use Neo gratis and under a free software license,

if you write proprietary software: you're not unlikely to bemaking money off of it, and then we think it's fair to ask you topurchase a commercial license.

It's not perfect, but we believe it's a good approximation of fair andethical as well as commercially viable. YMMV.

As for our corporate web site, it is scheduled to be released with full information on June 30. You guys caught us a bit off guard! :)

I'm not an expert on JCR by far, but I believe that JCR's core abstraction is a hierarchical view (i.e. tree) of the world whereas Neo sees the world as a graph/network. So that's at least one fundamental difference that would make a big impact on how you model your domain.

I don't think you need to be too defensive Emil; I think the open source community is starting to get to grips with the need for open source vendors to have the dual license model to protect against aggressive OEMing by closed source vendors - I for one think you've made a big enough leap going GPL and wish you great success. I am aware of but haven't (yet) used your product and think it's a great idea. Certainly there seems to be an ever growing need for associative data in highly distributed systems; the constant increase in the number and types of embedded devices (never mind RFID) is undoubtedly going to fuel the need for graph based data and ultimately the relational databases just won't scale to massive numbers of data sources with highly volatile data.

I think graph based data and temporal queries (like CEP) are going to be two of the biggest technologies of the next decade as we go for massively distributed event based systems to deal with the ever increasing volumes and volatility of information.

I appreciate the information about Neo, it meshes very well with a project I'm currently working on. Do you have anything to say about how this fits in with RDF datastores which are also a graph/network based model, comparisons between the approaches etc.

I'm currently using Sesame for my datastore and API and while it provides a support for a lot of the constructs I need there isn't anything like the traversal API of Neo, and such an API would be very useful for some usecases we have in mind.

Conversely, there are obviously a number of features supported by sesame by the fact it is RDF based that aren't supported in Neo.

I think graph based data and temporal queries (like CEP) are going to be two of the biggest technologies of the next decade

I couldn't agree more. Interesting connection with regards to how embedded devices will drive associative data, hadn't really thought about it that way before.

In addition to associativity and temporality, I believe another big emerging trend for the next decade is that of data semi-structure. We're entering an age now when anyone can add any tag to any photo, where any user can attach RDF meta data to any resource on the web without consulting a central authority. As the content creation process is becoming increasingly decentralized, every content item is getting a potentially unique schema. I think we'll be hard pressed to squeeze that into tables.

Do you have anything to say about how this fits in with RDF datastores which are also a graph/network based model

Hi Jonny,

Neo's data model actually maps very well to RDF. We're a graph, after all. In fact, we already have some components for exposing Neo as an RDF store, for example through a SAIL API and a SPARQL endpoint. Now, be warned: these components aren't yet productified and documented. But we have two commercial customers using them already to dump RDF/XML-formatted data into Neo and executing SPARQL queries to get it out.

We expect them to be productified and documented in early fall, but I think they're usable now with some patience. Feel free to join the mailing list and we'll be glad to get you up and running!

There is a plugin to Qi4j (www.qi4j.org) so that you can use Neo4j as the datastore for your entities. You would then model your domain using Qi4j (i.e. Java), and that would map more or less automatically to a Neo4j database. This gives a good tradeoff between the need for stability while still being able to use the dynamicity of Neo4j.

Does Neo4J support multiple independent JVMs (potentially on different machines) all accessing and manipulating the same graph concurrently? If so, how does it handle cross-cluster communication (for distributed locking, transactional semantics, cache management, etc.)?

Rickard, I can see why you would want to integrate Qi4j with Neo4j, the traversing mentality works much better than set queries for quite a few problem domains doesn't it. It's funny how a particular technology (RDMS and SQL especially) trains your mind to think in a particular way, so we get various OO style SQL dialects rather than graph traversal languages. It's certainly piqued my interest; now if you had a single query language which supported graph traversal and set queries....(Cue: response from someone who knows one :-) )

I tell you InfoQ does seem to pick some decent technologies to highlight.

Well,when it comes to RDF basd data and "graphy" traversal of that, there are some interesting, yet RDF centric approaches, e.g. Ripple, coming from here. However, at Neo4j the view is that OO graph manipulation, representation and traversal is more powerful, type safe and convenient for a developer than RDF based programming even if there are very interesting approaches around.

Cool links btw, much appreciated, a little more research for me to do.

However, at Neo4j the view is that OO graph manipulation, representation and traversal is more powerful, type safe and convenient for a developer than RDF based programming

Indeed, however the value of DSLs like SQL and so forth is the queries are easier to comprehend and can be highly succinct. A fantastic example of this of course is Esper whose query language is, well, bloody amazing to be honest. So in Einstein I can write code like:

(Which basically causes a widget with a value to be emitted every second and passed to Esper). With the Esper query giving me a running total over 30 seconds, I think it's pretty easy to follow what Esper is doing here.

However I agree there are times when you want to be as safe as humanly possible and have finer control.

We're working on a distributed Neo4j, which will support partitioning the graph onto multiple JVMs (so potentially on multiple machines). But this is not our top priority right now and it won't have production quality until 2.0. Our main focus atm is getting a kickass 1.0-final out the door. But we have several commercial customers who will need this in the 1-2 year time frame so it's definitely a direction where we will go.

It may be interesting to know that Neo4j can be deployed in the distributed OSGi runtime Newton as well as in its commercial counterpart Infiniflow. They are excellent environments for distributed apps and work well with Neo4j today.

Indeed Einstein is also being built to work seamlessly with Newton/Infiniflow I came across this project/product a year ago, they were still a bit ahead of the market back then, now it seems the market has very much caught up. It's a good quick way of turning a non distributed app into a distributed app quickly - you'll see what I mean if you try it.

Neo4j is not an object-oriented database and not about transparent persistence tied to a specific OO environment. We believe in separating data and logic much like the brains behind the relational model did. But unlike them, we think that a graph model is a better "natural" representation of most domains than relational sets ("tables") are.

We think that it's a LOT easier to map OO abstractions to a graph model though. I think this is because the OO abstractions are typically modeled after the real world (at least the entity abstractions) and many domains seem to actually be graphs.

Since we don't couple the persistent state of an application (i.e. its data) to the current OO implementation of it, the data layer becomes much more independent from the business layer. I think this is a sound architecture. The fact that we're not tied to a specific OO environment also allows us to integrate very well with systems that aren't strictly OO like Qi4j, Python and Ruby.

In our current (very prototype) distribution code line, you can declaratively assign segments of the node space to different machines. Every node space segment has a "master," which "owns" that segment (though it may be replicated on multiple other machines). For an HA scenario with two machines, you would designate one machine the owner of the ENTIRE node space and then writes would be synchronously or asynchronously cascaded to the slave, while reads would be dispatched to either one.

So HA is a functional subset of the fully distributed Neo 2.0 kernel. Our current roadmap has that HA functionality as part of the 2.0 release. We've had tentative discussions with some customers about pushing it ahead a bit (i.e. only HA bit as a standalone part before the 2.0 kernel), but nothing final. But if you need it for a commercial project, feel free to drop me a note and we can discuss if there's anything we can do.

Is your profile up-to-date? Please take a moment to review and update.

Email Address

Note: If updating/changing your email, a validation request will be sent

Company name:

Keep current company name

Update Company name to:

Company role:

Keep current company role

Update company role to:

Company size:

Keep current company Size

Update company size to:

Country/Zone:

Keep current country/zone

Update country/zone to:

State/Province/Region:

Keep current state/province/region

Update state/province/region to:

Subscribe to our newsletter?

Subscribe to our architect newsletter?

Subscribe to our industry email notices?

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.