From Relational to Neo4j

Goals This article explores the differences between relational and graph databases and data models. In addition, it explains how to integrate graph databases with relational databases and how to import data from a relational store. Prerequisites You should have looked… Learn More →

Developer

What is Neo4j

Goals

This article explores the differences between relational and graph databases and data models.
In addition, it explains how to integrate graph databases with relational databases and how to import data from a relational store.

Relational Databases

Relational databases have been the power-horse of software applications since the 80s, and continue so to this day.
They store highly structured data in tables with predetermined columns of certain types and many rows of the same type of information, and, thanks in part to the rigidity of their organization, require developers and applications to strictly structure the data used in their applications.

In relational databases, references to other rows and tables are indicated by referring to their (primary-)key attributes via foreign-key columns.
This is enforceable with constraints, but only when the reference is never optional.
Joins are computed at query time by matching primary- and foreign-keys of the many (potentially indexed) rows of the to-be-joined tables.
These operations are compute- and memory-intensive and have an exponential cost.

If you use many-to-many relationships, you have to introduce a JOIN table (or junction table) that holds foreign keys of both participating tables which further increases join operation costs.
Those costly join operations are usually addressed by denormalising data to reduce the number of joins necessary.

Although not every use-case is a good fit for this type of stringent data model, in the past, the lack of viable alternatives and the great support for relational databases has made it difficult for alternative models to break into the mainstream.

Meet graph databases.

From Relational to Graph Databases

Relationships are first-class citizens of the graph data model, unlike other database management systems, which require us to infer connections between entities using special properties such as foreign keys, or out-of-band processing like map-reduce.
By assembling the simple abstractions of nodes and relationships into connected structures, graph databases enable us to build sophisticated models that map closely to our problem domain.

In some regards, graph databases are like the next generation of relational databases, but with first class support for “relationships”, or those implicit connections indicated via foreign-keys in the traditional relational databases.

Each node (entity or attribute) in the graph database model directly and physically contains a list of relationship-records that represent its relationships to other nodes.
These relationship records are organized by type and direction and may hold additional attributes.
Whenever you run the equivalent of a JOIN operation, the database just uses this list and has direct access to the connected nodes, eliminating the need for a expensive search / match computation.

This ability of pre-materializing relationships into database structures allows Neo4j to provide performances of several orders of magnitude, especially for join heavy queries, the minutes to milliseconds advantage that many users leverage.

The resulting data models are much simpler and at the same time more expressive than those produced using traditional relational or other NoSQL databases.

Graph databases support a very flexible and fine-grained data model that allows you to model and manage rich domains in an easy and intuitive way.

You more or less keep the data as it is in the real world: small, normalized, yet richly connected entities.
This allows you to query and view your data from any imaginable point of interest, supporting many different use-cases.

The fine-grained model also means that there is no fixed boundary around aggregates, so the scope of update operations is provided by the application during the read or write operation.
The well-known and tested concept of transactions groups a set of updates of nodes and relationships into an atomic, consistent, isolated, and durable (ACID) operation.
Graph databases like Neo4j fully support the transactional concepts including write-ahead logs and recovery after abnormal termination.
So you never lose your data that has been comitted to the database.

If you’re used to modeling with relational databases, remember the ease and beauty of a well done, normalized entity-relationship diagram: a simple, easy to understand model you can quickly whiteboard with your colleagues and domain experts.
A graph is exactly that, a clear model of the domain, focused on the use-cases you want to efficiently support.

Let’s take a model of the organizational domain and show how it would be modeled in a relational database vs. the graph database:

Working with Neo4j

Querying relational databases is easy with SQL; a declarative query language that allows both for easy ad-hoc querying in a database tool as well as specifying use-case related queries in your code.
Even object-relational mappers use SQL under the hood to talk to the database.

Do graph databases have something similar?
Cypher, Neo4j’s declarative graph query language, is built on the basic concepts and clauses of SQL but has a lot of additional graph-specific functionality to make it simple to work with your rich graph model without being too verbose.
It allows you to query and update the graph structures, with concise statements.
Cypher is centered around the graph patterns that are core to your use-cases and represents them visually as part of its query syntax.

If you have ever tried to write a SQL statement with a large number of joins, you know that you quickly lose sight of what the query actually does, due to all the technical noise.

In Cypher the syntax stays clean and focused on domain concepts as the structural connections to find or create are expressed visually.
The other clauses besides the pattern matching should be very familiar for everyone who has used SQL before.

In the organizational domain depicted in the model above – what would a SQL statement that lists the employees in the “IT Department” look like, and how does that statement compare to a Cypher statement?

SQL Statement

SELECT name FROM Person
LEFT JOIN Person_Department
ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"

Language Drivers

Of course, you don’t want to connect to Neo4j manually, but with a driver or connector library designed for your stack or programing language.
Thanks to the Neo4j community, there are drivers for Neo4j for almost all popular programing languages, most of which mimic existing database driver idioms and approaches.

For instance, the Neo4j JDBC driver would be used like this to query the database for Johns departments:

Importing Data from a Relational Database

When you have a good enough understanding of the shape of your graph model, i.e. what data will be represented as nodes or relationships and how the labels, relationship-types, and attributes are named, you’re ready to go.

The easiest way to import data from your relational database is to create a CSV dump of individual entity-tables and join-tables.

Then you can take the CSV file(s) and use Cypher’s LOAD CSV power tool to:

Ingest the data, accessing columns by header name or offset

Convert values from strings to different formats and structures (toFloat, split, …​)

Skip rows to be ignored

MATCH existing nodes based on attribute lookups

CREATE or MERGE nodes and relationships with labels and attributes from the row data

This website uses 'cookies' to give you the best, most relevant experience. Using this website means you’re OK with this. You can change which cookies are set at any time - by clicking on more info. Accept