Pages

Friday, 21 February 2014

As you probably know, importing data into Neo4j can be a bit tricky, in spite of some of the wonderfultools that we have these days. I blogged about this last year, and if you are looking for some guidance then please go there.

Turns out that, in order to get the most out of your import efforts, there's actually a few settings that you should be aware of and tweak - depending on your specific environment. Your machine's memory will be of paramount importance, and your dataset will also determine some of the optimization characteristics that we will discuss below.

Essentially there's three parameters to tweak:

the Java heap size

the Memory-mapping of neo4j files

the neo4j cache configuration.

The following table will try to provide an overview of a number of settings that you can add to your neo4j installation/tooling to optimize your data import performance. Let me start by explaining some of these settings for the Batch Importer:

Settings

Batch Importer

Important Note: for the Batch importer, memory mapping settings are PART OF the heap settings above - you use a part of the heap size by using memory mapped files. That’s why you should try to give as much memory as possible as heap to the batch-importer. Leave 1-4GB to the operating system.

Try to memory map all of the node store, and as much of the relationship store files as possible.

For bulk update/import operations, the cache should be disabled as you write only and no node or relationship objects are loaded.

Edit: /path/to/importer/batch.properties

cache_type=none

Then, let's explore the same setting for the running neo4j server import capabilities, for example using neo4j-shell-tools:

Settings

Neo4j Server / Neo4j-shell-tools

Heap size

Edit: /path/to/neo4j/conf/neo4j-wrapper.conf

# Initial Java Heap Size (in MB)

wrapper.java.initmemory=4096

# Maximum Java Heap Size (in MB)

wrapper.java.maxmemory=4096

Memory mapping settings

Important Note: for the Neo4j server (and neo4j-shell-tools that run against a server), memory mapping settings are SEPARATE of the heap settings. Your heap memory allocation will be additional to the memory mapping allocation. Usually you use between 4 and 8GB as heap. The remainder of your RAM is used for memory mapping.

Note that on users that run Neo4j on Windows, there is a significant difference: there, the memory mapping is part of the heap, and the principle explained in the batch-importer section should be followed.

Try to memory map all of the node store, and as much of the relationship store files as possible.

Edit: /path/to/neo4j/conf/neo4j.properties

The settings and settings to be added to this file are identical to the ones mentioned for the Batch Importer:

As you create relationships by looking up and updating nodes, the cache should be kept active on a running neo4j server that you are loading data into. Here we have a difference between the Community and Enterprise editions of neo4j: the Enterprise edition has a better cache that is not present in Community - the “High Performance Cache”. Therefore, for bulk update/import operations, you should Edit the neo4j.properties file in the conf directory of your neo4j installation:

Edit: /path/to/neo4j/conf/neo4j.properties

# Setting for Community Edition:

cache_type=weak

# Setting for Enterprise Edition:

cache_type=hpc

I am hoping that this was a good overview of the different setting that you should keep in mind and tweak - and where you should tweak them - in your specific environment.