Warning: JavaScript is disabled on your browser. Parts of Neo4j.com will not work properly.

Cypher: LOAD JSON from URL AS Data

Update: Much of this got much easier today with user defined procedures, like apoc.load.json, which add this kind of capability to Cypher directly.

Neo4j’s query language Cypher supports loading data from CSV directly but not from JSON files or URLs.
Almost every site offers some kind of API or endpoint that returns JSON and we can also query many NOSQL databases via HTTP and get JSON responses back.
It’s quite useful to be able to ingest document structured information from all those different sources into a more usable graph model.
I want to show here that retrieving that data and ingesting it into Neo4j using Cypher is really straightforward and takes only little effort.
As Cypher is already pretty good at deconstructing nested documents, it’s actually not that hard to achieve it from a tiny program.
I want to show you today how you can achieve this from Python, Javascript, Ruby, Java, and Bash.

The Domain: Stack Overflow

Being a developer I love Stack Overflow; just crossed 20k reputation by only answering 1100 Neo4j-related questions :). You can do that too. That’s why I want to use Stack Overflow users with their questions, answers, comments and tags as our domain today.

Graph Model

So what does the graph-model look like? We can develop it by looking at the questions we want to answer and the entities and relationships they refer to.
We need this model upfront to know where to put our data when we insert it into the graph. After all we don’t want to have loose ends.

Cypher Import Statement

The Cypher query to create that domain is also straightforward. You can deconstruct maps with dot notation map.key and arrays with slices array[0..4]. You’d use UNWIND to convert collections into rows and FOREACH to iterate over a collection with update statements. To create nodes and relationships we use MERGE and CREATE commands.
My friend Mark just published a blog post explaining in detail how you apply these operations to your data.
The JSON response that we retrieved from the API call is passed in as a parameter {json} to the Cypher statement, which we alias with the more handy data identifier. Then we use the aforementioned means to extract the relevant information out of the data collection of questions, treating each as q.
For each question we access the direct attributes but also related information like the owner or contained collections like tags or answers which we deconstruct in turn.

Calling Cypher with the JSON parameters

To pass in the JSON to Cypher we have to programmatically call the Cypher endpoint of the Neo4j server, which can be done via one of the many drivers for Neo4j or manually by POSTing the necessary payload to Neo4j. We can also call the Java API.
So without further ado here are our examples for a selection of different languages, drivers and APIs:

Conclusion

So as you can see, even with LOAD JSON not being part of the language, it’s easy enough to retrieve JSON data from an API endpoint and deconstruct and insert it into Neo4j by just using plain Cypher.
Accessing web-APIs is a simple task in all stacks and languages and JSON as transport format is ubiquitous.
Fortunately, the unfortunately lesser known capabilities of Cypher to deconstruct complex JSON documents allow us to quickly turn them into a really nice graph structure without duplication of information and rich relationships.
I encourage you to try it with your favorite web-APIs and send us your example with graph model, Cypher import query and 2-3 use-case queries that reveal some interesting insights into the data you ingested to content@neotechnology.com.
Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and start building better apps powered by graph technologies.

Author

Michael Hunger, Developer Relations

Michael Hunger has been passionate about software development for a very long time. For the last few years he has been working on the open source Neo4j graph database filling many roles. As caretaker of the Neo4j community and ecosystem he especially loves to work with graph-related projects, users and ... know more

9 Comments

I tried to make Bash example work with Neo4j 2.2.3 Community Edition from my MacBook Pro and it is not working.

I get the following error from Neo4j: {“results”:[],”errors”:[{“code”:”Neo.ClientError.Request.InvalidFormat”,”message”:”Unable to deserialize request: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value\n at [Source: HttpInputOverHTTP@190a6a94; line: 1, column: 51]”}]}

I trimmed JSON size to only 5 posts since was getting an error when using 100 that it was too big.

Any idea?

I am trying to do something else and this example is perfect example of what I need, so first need to get it working to move to my actual work.

In my work for same concept: JSON that gets injected as parameter in Cypher query I get: {“errors”: [{“code”: “Neo.ClientError.Request.InvalidFormat”, “message”: “Unable to deserialize request: Unexpected character (‘e’ (code 101)): was expecting comma to separate OBJECT entries\n at [Source: HttpInputOverHTTP@6781a645; line: 1, column: 104]”}], “results”: []} where ‘e’ is just first character inside my JSON document.

There were 3 issues:
1. New lines in file containing Cyper queries were causing issues so I ended up with one gigantic line with spaces between commands. I guess Cypher parser expects this on the other side.
2. All JSON data needs to have escaped “. So I guess some data clean-up is needed before this JSON is used in BASH. I use my own JSON file I have cleaned up as pre-requisit.
3. Removed \” around $JSON_DATA in POST_DATA since data inside $JSON_DATA starts with { and no \” needed around.

Spring Boot, the new convention-over-configuration centric framework from the Spring team at Pivotal, marries Spring’s flexibility with conventional, common sense defaults to make application development not just fly,but pleasant!

I am trying to compile the java example and but get an error on the following statement:
HttpClient http = HttpClients.createMinimal();

After searching on the web I found the variable type could be CloseableHttpClient instead of HttpClient .
but even with this change the list of imports and jars I supplied in the CLASSPATH do seem to be sufficient to resolve HttpClients.createMinimal() function.

I added more imports without any more success to compile the code.
import org.apache.http.impl.client.*;
import javax.net.*;

Hi,
I am trying to adjust the tutorial to the newer py2neo version. It seems the syntax has changed significantly and now the following line does not work anymore:
neo4j.CypherQuery(graph, query).run(json=json)

I have tried this:
results = graph.cypher.run(query,json=json)

However, it didn’t work and produced more errors. Any suggestions on how to change the code to adjust to the new syntax?

the main contribution of this demo project is showing the auth header. The complete query and api link are also show, so that you can just run the script without any addition copy-pasting or guesswork 🙂

Hi, I was hoping you could help me. I have searched everywhere but have not found a way to traverse through the JSON file so that I can extract the values under “300xxxxx” (such as uid and pubdate) so that I can then load the key/value pairs in my graph.

As you can see (for reasons I cannot control) this dataset is constructed in a redundant way. Do you know how I can basically “ignore” the 30073633 and 30050740 values and just extract the key value pairs inside that object or, alternatively, load it a variable property name/key onto my graph?