It's official: graph databases are a thing. That's the consensus here on Big on Data among fellow contributors Andrew Brust and Tony Baer. When AWS enters a domain, it officially signals the upward slope of the hype cycle. It's a bit like newfound land - first it's largely unknown and inhabited by natives, then the pioneers show there are opportunities, then the heavyweights will try to colonize it.

The recent unveiling of AWS Neptune seems to have convinced even once self-proclaimed graph skeptics such as Brust and Baer. Why now, you ask? Much like Machine Learning for example, it's not so much that there is a major breakthrough in the technology, rather it's mostly a matter of maturation.

Hardware and software capabilities such as cheap storage and processing capacity in the cloud and on premise, understanding of the challenges in techniques for distributed indexing and querying of graphs and the realization of having big and connected enough datasets have all contributed to the perfect graph storm.

Your videos are probably doing quite nicely living in the object store where you currently have them. A sales ledger system built using a relational database is probably doing just fine where it is and likewise a document store is quite possibly just the right place to be storing your documents. So "use the right tool for the job" remains as valid a phrase here as elsewhere.

That said, part of the reason behind graph's appeal is that in many cases, it's a natural way to model the world. More natural than the good old relational model? For certain domains and use cases, when the data you are storing is intrinsically linked by its nature, yes. For one thing, it certainly feels easier and performs better to query a graph database than a relational one for use cases involving many hops.

In connected datasets, such as the ones from social domains for example, graph makes lots of sense. Image: Amazon

Having to go through a series of joins in relational algebra to do things such as finding friends of friends of friends is cumbersome to write and maintain and degrades performance. A graph model and query language can be more natural and efficient -- but the key word in there is "can". Not everything that looks like a graph is in fact a graph, and not all graphs come with the same querying facilities.

To quote Tony Baer: "I always felt graph was better suited being embedded under the hood because it was a strange new database without standards de facto or otherwise. But I'm starting to change my tune -- every major data platform provider now has either a graph database or API/engine". This highlights two important points: the difference between a native graph and a graph API, and the lack of standards.

Going native

Different people will use different definitions of engines and APIs, but in the end it's all about data structures. If your database relies on data structures that are not a natural fit for a graph, and does not have all the right indexing in place, then although your queries may be easier to write using a graph API on top of it, their performance can only be as good as your database.

To give an example from the Microsoft world, quoting Andrew Brust: "The graph processing capabilities in SQL Server 2017 are clearly an abstraction layer and not native. Although node and edge table types are a real thing. But what about Cosmos DB? Graph is just one mode of operation, but I would still consider it native".

We don't really know, but probably not. AWS speaks of the ability for continuous backup from Neptune to S3, which is quite telling. If S3 was the storage used for Neptune, S3 backups would be pointless as data would already be on S3, and all it would take would be to enable replication. But there is another hint there.

AWS is selling the option to use JanusGraph with Amazon DynamoDB as its storage backend. DynamoDB is a key-value database, and a key-value metaphor, and structure, lends itself well to graph. That is in fact what Titan and now JanusGraph are using as a back-end store for their graphs, so it makes sense for AWS to have built Neptune on DynamoDB.

To return to the Big on Data contributor graph showdown and quote Andrew Brust, "in the database world, everything comes down to key-value pairs. So if you have a database with that as the core construct, you have the potential to do almost anything you want. Although, out of the box, you may not be able to do much".

So could it be that AWS Neptune really is an elaborate layer over DynamoDB that adds a graph metaphor and API to an underlying key-value store? That may sound like oversimplifying, but it seems plausible.

One could argue that Titan and its offsprings, JanusGraph and DSE Graph, are similar in nature, and AWS makes a point of emphasizing how Titan's pluggable architecture makes it easy to start using DynamoDB without changing applications. But how efficient is that?

Standards, too many or none

What we do know however about AWS Neptune, which brings us to the second important point -- standards -- is this: Neptune supports the popular graph query languages Apache TinkerPop Gremlin and W3C's SPARQL, allowing users to easily build queries that efficiently navigate highly connected datasets.

In a world that seems to be lacking the equivalent of what SQL is in the relational world -- a de facto standard for querying -- this is pretty important. It means that Neptune offers maximum flexibility for its users, and it's a move that is both smart and pragmatic from AWS.

In graph, there are competing models and query languages, and offering the ability to query Neptune using two of the most popular ones widens Neptune's potential user base and use cases. AWS is not alone in this, but being vocal about it and making it easy to use could make a difference.

Thank You

By registering you become a member of the CBS Interactive family of sites and you have read and agree to the Terms of Use, Privacy Policy and Video Services Policy. You agree to receive updates, alerts and promotions from CBS and that CBS may share information about you with our marketing partners so that they may contact you by email or otherwise about their products or services.
You will also receive a complimentary subscription to the ZDNet's Tech Update Today and ZDNet Announcement newsletters. You may unsubscribe from these newsletters at any time.