Neo4j Blog

Nodes are people, too

Today we are releasing Milestone Release Neo4j 2.0.0-M01 of the Neo4j 2.0 series which we expect to be generally available (GA) in the next couple months. This release is significant in that it is the first time since the inception of Neo4j thirteen years ago that we are making a change to the property graph model. Specifically, we will be adding a new construct: labels.

We’ve completed a first cut at a significant addition to the data model, and are opening the code up now for early comment. Consider this milestone to be an experimental release, intended to solicit input. We look forward to hearing how you’d like to use these new features, and can’t wait to hear what you think.

It’s a What?

Let’s say you created a node for a person named Joe. Joe is not just any node: he is a person. Therefore you would probably want to designate the node for Joe as being a “Person”. If you’ve worked with Neo4j before, chances are that you’ve done this by adding a property called “type” with value “Person”, as follows:

This is useful, because now I can differentiate Joe from things in my graph that are quite different, such as “household goods” nodes and “geo location” nodes. Rightly so, these things should receive very different treatment.

Now let’s say you also want to give Joe a party affiliation: Left-Wing, Right-Wing, or the moderate Middle-Wing. While you could do this with a property as well, you may decide that you want to easily find all people of a given party affiliation. Knowing that Joe is “Middle-Wing”, you might decide to break the parties into nodes, and then associate Joe with his party, as below:

One thing you’d now naturally want the graph to do, is to automatically index the “Person” nodes (and no other nodes), according to the unique identifier for “Person”. (Let’s oversimplify and say this is “name”). If you’re using Cypher, this is a challenge today. In fact it’s not possible at all, because Neo4j doesn’t inherently know anything about “Person” being different from geo locations. If you want to index “name”, you end up doing it for everything in the graph, which mixes concerns. Geo Location names aren’t the same as person names, any more than a city is like a person. As for the “Middle-Wing” node, it ends up becoming extremely dense, cluttering the graph with lots of connections whose sole purpose is to designate nodes as belonging to a group.

We’ve been looking at better ways to do this. The ideal solution would help to make one’s graph more understandable, as well as to make Cypher more powerful, by allowing it to home in on nodes (as well as to index them) according to what they are.

2.0 therefore introduces a means of grouping or categorizing nodes. Provisionally we are calling this construct a “Label”. The term “Label” speaks to its generic use, and to the fact that nodes can have multiple labels. One of the many uses of labels–and perhaps the most intuitive one at first–is to provide “hooks” in the graph that you can associate with your application’s type system. Because the facility isn’t itself explicitly hierarchical (it’s just literally a tag, of which you can have zero to many per node), they’re being called labels.

Labels

A graph is a graph because it has relationships in the data. In a Property Graph, a relationship always has a type, describing how two nodes are related. Labels expand on that idea, describing how entire sets of nodes are related. This is a grouping mechanism for nodes. How does it work? Very simple: in the example above, rather than adding a “Type” property and connecting Joe to a Party node, you would add two labels: one for “Person”, and one for “Middle-Wing”, just like so:

This opens up quite a few possibilities, and probably stirs up a lot of ideas in your head. Rather than color your thinking about how to use labels, let’s look at an example using different color sets.

Color me happy

Let’s say we have an arbitrary domain of loosely related stuff, within which we at least know that things can be red, green, or blue. We could just add a “color” property to each node, or relate them to a value node for each color. But because we want to always work within this group, we’ll use labels to identify members of the sets.

First, create something red:

CREATE a node with a Label

CREATE (thing:Red {uid: “TK-421″, make: 191860 })

RETURN thing;

To find the thing we just created, we can search within just the Red nodes, then return the labels:

Find the Labels on a node

MATCH (thing:Red)

WHERE thing.uid = “TK-421″

RETURN labels(thing);

Why labels, plural? Because nodes can have multiple labels. Let’s say that “TK-421″ also belongs to the blue set. Add a blue label like this:

Add a Label to a node

MATCH (thing:Red)

WHERE thing.uid = “TK-421″

SET thing :Blue;

The benefits of intentional labeling

While some Danes may be nervous about labels, much good comes from their use. Applying a label to a set of nodes makes your intention obvious — “these nodes are accessed frequently and thought of as a group.” The database itself can gain benefit from having your intention be explicit, because it can now do things with this information.

For starters, Neo4j can create indexes that will improve the performance when looking for nodes within the set. (Note the new Cypher syntax for index creation!):

CREATE INDEXES to speed up finding Red and Blue nodes

CREATE INDEX ON :Red(uid);

CREATE INDEX ON :Blue(uid);

Create a second labeled node and a relationship

CREATE (other_thing:Blue {uid: “TURK-182″, make: 181663})

WITH other_thing

MATCH (thing:Red)

WHERE thing.uid = “TK-421″

CREATE (thing)-[:HONORS]->(other_thing)

RETURN thing, other_thing;

There is much more fun to be had. Details are, as always, in the Neo4j Manual. Again, this simple change can have profound impact. As we’re exploring the possibilities and tuning the language and APIs, we’d love for you to play around with labels. Let us know how you want to use them, by providing feedback on the Google Group. (That way other people can see your feedback and respond with their own opinions and observations.)

One more thing…

Just in CASE

Cypher has a new CASE expression for mapping inputs to result values: a cousin to similar constructs found in every common programming language.

In its simple form, CASE uses a direct comparison of a property for picking the result value from the first matching WHEN:

MATCH (r:Red) RETURN CASE r.uid

WHEN “TK-421″ THEN “Why aren’t you at your post?”

WHEN “TURK-182″ THEN “the work of one man”

ELSE “…”

END

In the general form, each WHEN uses an arbitrary predicate for picking the result:

MATCH (r:Red) RETURN CASE

WHEN r.color > 180000 THEN “redish”

WHEN r.color < 180000 THEN “purplish”

ELSE “simply red”

END

Summary

Enjoy this preview milestone! Use theNeo4j Google Group to tell the Neo4j team and other members of the Neo4j community what you think. There are a few other improvements baked into this release as well, including to the shell, that we’ll cover in upcoming blogs. And of course you’ll be seeing more in upcoming Milestones of Neo4j 2.0. Meanwhile, we have upgraded a preview of the online console for you to test the new features, it now features the Matrix graph enhanced with labels.

One final note: if you are planning to go into production soon, we strongly recommend developing against 1.9, which we expect to be going GA in the next couple weeks (look for an RC this week).

Update – 2.0.0-M02 introduces Remote Transactions

The latest 2.0 milestone introduces a new HTTP endpoint for managing multiple Cypher statements within a single transaction. Just create the transaction with the first batch of statements. You’ll receive a URL to which additional requests can be submitted, and for committing or rolling back the transaction. See the Neo4j manual for all the details.

Enjoy, from the Neo4j Team!

Keywords:

19 Comments

Thanks Philip – looks very interesting.<br /><br />Can I use labels for aggregating bits of graphs? I&#39;d like to search for all &#39;Red&#39; nodes, all &#39;Blue&#39; nodes and all the links between the two node types.<br /><br />Could you provide a cypher query which does this? I&#39;m interested to find out what happens when nodes are both Red and Blue.<br /><br />Also – could you link to

Hey Joe,<br /><br />You can do graph-global queries using labels. Your example would look like:<br /><br />MATCH (reddish:Red)-[related]-(bluish:Blue)<br />RETURN reddish, related, bluish;<br /><br />This would return all the pairs of Red node which have any relationship to Blue nodes. If the same node is Red and Blue, it can appear on either side, and even by itself if it has a self-relationship

Hi guys! Awesome pace of progress!<br /><br />But I&#39;m not sure I fully see the benefit of the Type-Labels vs. the usual properties with an index. After all, Spring-Data has a @Indexed annotation that allows creation of separate indexes by &quot;type&quot;. Seems very similar.<br /><br />Is the point of Type-Labels to bring this functionality into the core of Neo, and thus make it more

The index added using the labels is not working for me.<br /><br />Could you please tell me if its the right way to use it?<br /><br />CREATE INDEX ON :nodes(id);<br /><br />No errors here.<br /><br />then I try to get a node using<br />start n=node:nodes(id=&quot;bryan.roberts&quot;)<br />return n;<br /><br /><br />I am getting an error,<br />Index `nodes` does not exist<br /><br /><br />Any

Looks very interesting indeed. I remember Stefan Armbruster mentioning this in the tutorial back in the Netherlands. I am curious though about the possibilities :<br /><br />Since labels are basically categories, can you create a category and then some subcategories? For instance Employee and then SalesEmployee and DevelopmentEmployee?<br /><br />And can you add multiple categories to someone,

I&#39;m sorry, I didn&#39;t realise that the API was already available. I found out that you can in fact add multiple labels to a single node which is very useful for me. So ignore that question.<br /><br />I couldn&#39;t find something though about &quot;sublabels&quot;. ATM, I&#39;m able to create a CategoryNode who has a specific relation to each of his SubCategoryNodes. I index all these

Pieter-Jan Van Aeken:<br />So you&#39;re thinking about performance of doing an index lookup, looping through that result vs. getting the relationships of a node directly, right?<br /><br />That is very much up to the index in use. We&#39;ve given much thought about the index provider API for labels&amp;indexing so that it&#39;s as easy as possible to plug in new ones. The default for now is

@Mattias Persson<br /><br />You&#39;re right. I&#39;m interested in the performance difference between retrieving all nodes of a specific category by traversing the relations of that category node, vs doing an index lookup based on the label of that node where the label is an indication of that nodes category. <br /><br />I&#39;m still somewhat worried though. The only nodes I need to index atm

@Pieter<br /><br />Labels would just present one more option for modeling your domain. Of course, picking the best approach really depends on the questions you want to ask. <br /><br />What kind of queries do you need for things of a particular sub-category? Do you need to scan through all of them, or find a particular one (or few)?

I&#39;d like to be able to fill up tables with an entire category or subcategory. So if I have three categories, Person, Employee and Employer where the latter 2 are a subcategory of the first, I want to be able to create a Person table, a Employee table and an Employer Table.<br /><br />In that table I want either all nodes, or a subset if I&#39;m using pagination. And I want to be able to loop

Labels looks like a very good way to map Classes in your application to Nodes in the graph. Also, with inheritance, one could label the node with the base class name and the subclass name. This is a very welcome addition to neo4j.

Hi, maybe i do not understand it fully yet but imho a &quot;label&quot; is still a (multivalue) &quot;property&quot; of a node.. what is the exact differentiator why a &quot;property&quot; like we know right now cannot be used for exactly the same purpose as the addition of a Label to the datamodel (making it more complex) ?? maybe through some syntax extension of some commands etc etc..<br /><