In a nutshell: to make it work, scale like the web, not like a database.

Let me explain:

If you start out with a single relational database server, and you want to scale it horizontally to thousands of servers, you have a problem: it doesn’t really work. SQL makes too many assumptions that one simply cannot make for a massively distributed system. Which is why instead of running Oracle, Google runs BigTable (and Amazon runs Dynamo etc.), which were designed from the ground up for thousands of servers.

If you start out with a single hypertext system, and you want to scale it horizontally to millions of servers, you have a problem, too: it doesn’t really work either. Which is why we got the world-wide-web, which has been inherently designed for a massively decentralized architecture from the get-go, instead of a horizontally-scaled version of Xanadu.

So he we are, and people are trying to scale graph databases from one to many servers. They are finding it hard, and so far I have not seen a single credible idea, never mind an implementation, even in an early stage. Guess what? I think massively scaling a graph database that’s been designed using a one-server philosophy will not work either, just like it didn’t work for relational databases or hypertext. We need a different approach.

With InfoGrid, we take such a fundamentally different approach. To be clear, its current re-implementation in the code base is early and is still missing important parts. But the architecture is stable, the core protocol is stable, and it has been proven to work. Turning it back on across the network is a matter of “mere” programming.

To explain it, let’s lay out our high-level assumptions. They are very similar to the original assumptions of the web itself, but unlike traditional database assumptions:

Traditional database assumptions

Web assumptions

InfoGrid assumptions

All relevant data is stored in a centrally administered database

Let everybody operate their own server with their own content and create hyperlinks to each other

Let everybody operate their own InfoGrid server on their own data and share sub-graphs with each other as needed (see also note on XPRISO protocol below)

Data is “imported” and “exported” from/to the database from time to time, but not very often: it’s an unusual act. Only “my” data in “my” database really matters.

Data stays where it is, a web server makes it accessible from/to the web. Through hyperlinks, that server becomes just one in millions that together form the world-wide-web.

Data stays where it is, and a graph database server makes it look like a (seamless) sub-graph of a larger graph, which conceivably one day could be the entire internet

Mashing up data from different databases is a “web services” problem and uses totally different APIs and abstractions than the database’s

Mashing up content from different servers is as simple as an <a…, <img… or <script… tag.

Application developers should not have to worry which server currently has the subgraph they are interested in; it becomes local through automatic replication, synchronization and garbage collection.

This approach is supported by a protocol we came up with called XPRISO, which stands for eXtensible Protocol for the Replication, Integration and Synchronization of distributed Objects. (Perhaps “graphs” would have been a better name, but then the acronym wouldn’t sound as good.) There is some more info about XPRISO on the InfoGrid wiki.

Simplified, XPRISO allows one server to ask another server for a copy of a node or edge in the graph, so they can create a replica. When granted, this replica comes with an event stream reflecting changes made to it and related objects over time, so the receiving server can obtain the received replica in sync. When not needed any more, the replication relationship can be canceled and the replicas garbage collected. There are also features to move update rights around etc.

For a (conceptual) graphical illustration, look at InfoGrid Core Ideas on Slideshare, in particular slides 15 and 16.

Code looks like this:

// create a local node:
MeshObject myLocalObject = mb.getMeshObjectLifecycleManager().createMeshObject();
// get a replica of a remote node. The identifier is essentially a URL where to find the original
MeshObject myRemoteObject = mb.accessLocally( remote_identifier );
// now we can do stuff, like relate the two nodes:
myLocalObject.relate( myRemoteObject );
// or set a property on the replica, which is kept consistent with the original via XPRISO:
myRemoteObject.setProperty( property_type, property_value );
// or traverse to all neighbors of the replica: they automatically get replicated, too
MeshObjectSet neighbors = myRemoteObject.traverseToNeighbors();

Simple enough? [There are several versions of this scenario in the automated tests in SVN. Feel free to download and run.]

What does this contrarian approach to graphdb scaling give us? The most important result is that we can assemble truly gigantic graphs, one server at a time: each server serves a subgraph, and XPRISO makes sure that the overlaps between the subgraphs are kept consistent in the face of updates by any of the servers. Almost as important is that it enables a bottom-up bootstrapping of large graphs: everybody who feels like participating sets up their own server, populates it with the data they wish to contribute (which doesn’t even need to be “copied” by virtue of the InfoGrid Probe Framework), and link to others.

Now, if you think that makes InfoGrid more like a web system than a database, we sympathize. There’s a reason we call InfoGrid an “internet graph database”. Or it might look more like a P2P system. But other NoSQL projects use peer-to-peer protocols, too, like Cassandra’s gossip protocol. And I predict: the more we distribute databases, the more decentralized they become, the more the “database” will look like the web itself. That’s particularly so for graph databases: after all, the web is the largest graph ever assembled.

We do not claim that this approach addresses all possible use cases for scaling graph databases. For example, if you need to visit every single node in a gazillion-node graph in sequence, this approach is just as good or bad as any other: you can’t afford the memory to get the entire graph onto your local server, and so things will be slow.

However, the InfoGrid approach elegantly addresses a very common use case: adding somebody else’s data to the graph. “Simply” have them set up their own graph server, and create the relationships to objects in other graph servers: XPRISO will keep them maintained. Leave data in the place where its owners have it already is a very powerful feature; without it, the web arguably could not have emerged. It further addresses control issues and privacy/security issues much better than a more database’y approach, because people can remain in control over their subgraph: just like on the web, where nobody needs to trust a single “web server operator” with their content; you simply set up your own web server and then link.

Comments:

The replication policy is pluggable, with the same API towards the application. So different strategies can be used and improved on without impacting application code. For example, if you have a chain of replicas (object X on server A replicated to server B, from there to C), you can configure whether it should rearrange itself so that the replica at C cancels its lease with B and instead directly connects to A.

I think this is pretty interesting and sort of a logical place to sit for a few reasons. Personally, I wonder if it could be pushed any further towards something that is truly decentralized like RDF (which is arguably more like the web). Though networks of RDF servers don’t really solve some of the really gnarly traversal issues.

Without having looked into the code, I’m assuming you are requiring that a node get replicated locally before it can be linked locally? (as in your example) This would obviously limit scalability for really connected information but I think it’s still much better than sharding-based approaches for flexibility. Or is it URI-like under the covers and only caching locally as needed for performance reasons?

I would think the really tricky problem is keeping the replicated subgraphs in sync with their sources… which in turn may be dependent on other replicated sources, and so on. I assume the event stream is somehow robust in the face of network outages, etc..

It seems a truly scalable approach would combine something completely decentralized like RDF’s URI style linking with some smart local caching for performance. Which is maybe what you are doing.