Pages

Sunday, 2 March 2014

Food networks, Countries, Diets, Health - and LOAD CSV

Last weekend I took my kids to the awesome Antwerp Zoo. We have season's tickets, and go there regularly - but for some
reason it had been a while since all of us had gone together. While visiting
the penguins, my daughter points to this picture

and shouts: "LOOK DADDY, A NEO4J DATABASE!".
I am not kidding you - true story. And she was, of course, right: it was the
"web of the sea", a predator-prey
network of how sealife interacts with eachother.

So that got me browsing the web for a while,
looking for other examples of such networks. And before long I found a dataset
that really triggered my interest: on "Follow the data" I found this
article that mentioned a google
spreadsheet with some really
interesting stuff. It basically has a lot of information about Countries, their
dietary habits, and their health statistics. Excellent. I can make a graph out
of that.

Neo4j 2.1.MO1 – native loading of CSV files in cypher

At the same time one of my colleagues pinged me
about a new milestone beta release of neo4j: version 2.1.MO1. This is the first
milestone release after the ground-breaking 2.0 release that came out end of
last year – and it is looking like a very interesting one. One of the key new
features in 2.1 is going to be a set of features that will allow us to Import
Data more easily. A pet pieve of mine, as you know.

I read through the manual pages, and thought it
would be easy enough to use. So I spend some time getting the spreadsheet
mentioned above into the right format for import, and took it for a spin.

In the zip-file
over here, you can download a couple of files that allow you to do it all
yourself. But for now, let me take you through it.

Importing data with LOAD CSV

The process of importing data was really, really
easy now. All you need to do is

tell Neo4j what to do (load the csv),

assign a variable to the set (csvimport in this case) and then

use the column names of the set as parameters for your cypher
statement.

The result was there instantaneously:

One thing that I did want to do then, was to use
labels to provide structure to the graph, and use it for indexing:

With that Import I have my Countries and my Food
Categories imported, so now I would want to add some relationships. I chose a model
like the one outlined below: a country uses different food categories, at a
different rate (kcalories used per day).So first we import the relationships between the
countries and the food categories used in that country:As you can see, the relationships hold the
values of the kilo-calories that that country uses of this specific food
category.That was quick!

So now we can do some querying. Let’s see what
are the food categories that Belgium and the Netherlands have in common, and
that have a significant part of the diet:

When we limit the query to only the food categories that are used for more than 500 kcal per day, we get:These are the categories that apply:

Animal Products

Vegetal Products

Cereals - Excluding Beer (strange!)

Wheat

Then, I decided
to use LOAD CSV one last time to add some health data that was also in the original dataset: the life expectancy data
of the countries in the dataset. This data contains two interesting data
elements that I imported:

The Life Expectancy At Birth (LEAB)

The HEalthy Life Expectancy At Birth (HELEAB)

I decided to import both of these, in a specific
way. You may have been able to tell from the model picture above, but I created
an in-graph Life Expectancy Index. By importing 100 Life Expectancies (1-100
years of age) as separate nodes, and then connecting the countries to these
nodes as I used LOAD CSV. I used two different types of relationships for the
LEAB and the HELEAB.

The following import was easy using LOAD CSV:

So then we could actually revisit the queries
above, but include these interesting health stats about life expectancies:

The result shows how Belgium and the Netherlands
have identical LEABs, but different HELEABs – interesting.

I am sure there are a bunch of other interesting
queries in this dataset, but for now I think I have satisfied my curiosity –
and learned about an awesome new Import tool – LOAD CSV.