The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

While we’re cooperating with Databricks in other areas like the implementation of openCypher on Spark and as an industry-partner of AMPLab, today I want to focus on the Neo4j Spark Connector.

Enabled by Neo4j 3.0

One of the important features of Neo4j 3.0 is Bolt, the new binary protocol with accompanying official drivers for Java, JavaScript, .NET and Python. That caused me to give implementing a connector to Apache Spark a try, and also to see how fast I can transfer data from Neo4j to Spark and back again.

The implementation was really straightforward. All the interaction with Neo4j is as simple as sending parameterized Cypher statements to the graph database to read, create and update nodes and relationships.

Features of the Spark Connector

So I started with implementing a Resilient Distributed Dataset (RDD) and then added the other Spark features, including GraphFrames, so that the connector now supports:

Quickstart

For a simple dataset of connected people, run the following two Cypher statements that create 1M people (with :Person labels and id, name and age attributes) and 1M :KNOWS relationships, all in about a minute.

Please Help

The connector, like our official drivers is licensed under the Apache License 2.0. The source code is available on GitHub and the connector and its releases are also listed on spark packages.

I would love to get some feedback of the things you liked (and didn’t) and that worked (or didn’t). That’s what the relase candidate versions are meant for, so please go ahead and raise GitHub Issues.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.