The basic question is, how do we read an entire graph from a Neo4j store into a
NetworkX graph? And another question is, how do we extract subgraphs from
Cypher and recreate them in NetworkX, to potentially save memory?

Using a naive query to read all relationships

This is based on cypher-ipython module. This uses a simple query like the
following to obtain all the data:

MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN n, r

This can be read into a graph using the following code. Note that the rows may
duplicate both relationships and nodes, but this is taken care of by the use of
neo4j IDs.

There's something about this query that is rather inelegant, that is that the
result set is essentially 'denormalized'.

Using aggregation functions

Luckily there's another more SQL-ish way to do it, which is to COLLECT the
relationships of each node into an array. This then returns lists which
represent a distinct node and the complete set of relationships for that node,
similar to something like the ARRAY_AGG() and GROUP BY combination in
PostgreSQL. This seems much cleaner to me.

Trying to extend to handle subgraphs

When we have relationship types that define subtrees, which are labelled
something like :PRECEDES in this case, we can attempt to materialize this
sub-graph selected from a given root in memory. In the query below, the Token
node with content nonesuch is taken as the root.

# This version has to materialize the entire node set up front in order
# to check for dangling references. This may induce memory problems in large
# result sets
def rs2graph_v3(rs):
graph = networkx.MultiDiGraph()
materialized_result_set = list(rs)
node_id_set = set([
record['n2'].id for record in materialized_result_set
])
for record in materialized_result_set:
node = record['n2']
if not node:
raise Exception('every row should have a node')
print("adding node")
nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = list(node.labels)
graph.add_node(node.id, **nx_properties)
relationship_list = record['rels']
for relationship in relationship_list:
print("adding edge")
# Bear in mind that when we ask for all relationships on a node,
# we may find a node that PRECEDES the current node -- i.e. a node
# whose relationship starts outside the current subgraph returned
# by this query.
if relationship.start in node_id_set:
graph.add_edge(
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)
else:
print("ignoring dangling relationship [no need to worry]")
return graph