GraphAware Blog - Cypher

We have already blogged about fulltext search available in Neo4j 3.5. The list of available analyzers covers many languages and fits various use cases. However once you expose the search to real users they will start pointing out edge cases and complain about the search not being google-like.Speakers of languages using accents in their written form quite often leave out the accents. This has various reasons, the most common ones are historical, when different character encodings caused problems and users find it hard to change their habits using a different default keyboard layout (e.g. en_US); switching the layout just for...

The Cypher query planner is quite advanced and mature, and you can mostly rely on it to pick the best plan for your query. However, there are rare cases, or bugs, that might want you looking for ways to influence that plan. This article demonstrates practical usage of an index hint. Note that all queries were tested against Neo4j Enterprise 3.5.8The graph modelThis is the relevant portion of the graph model that is sufficient to demonstrate the issue.Simple enough- we have many tweets, and tweets have keywords.Our graph has two indexes, one on the value of the Keyword, and the...

There is one common performance issue our clients run into when trying their first Cypher queries on a dataset in Neo4j. When writing a query, be sure that it doesn’t match any cycles, or you can experience unpleasant surprises.Assume the following sample graph and simple query:CREATE (a:Node {name: "A"}), (b:Node {name: "B"}), (c:Node {name: "C"}), (a)-[:TO {name: "1"}]->(b), (a)-[:TO {name: "2"}]->(b), (a)-[:TO {name: "3"}]->(b), (b)-[:TO {name: "4"}]->(c)MATCH p=({name: "A"})-[*..10]-({name: "C"}) RETURN pThe query returns 9 paths, instead of 3 as you might have guessed! The additional 6 paths have length 4 with node pattern A-B-A-B-C, note the repeated nodes A...

In this blog we will go over the Full Text Search capabilities available in the latest major release of Neo4j.Contrary to our usual blogs, the content will rather focus on the underlying search engine used by Neo4j, that is Apache Lucene in version 5.5.5 .What exactly is Search ?Search is an interaction between a user and a search engine. The user has an information need at hand and attempts to satisfy it by providing a search with adequate constraints.The search engine uses those constraints to collect matching results and return them to the user.What is a Search Engine ?A search...

A book tells us a story, but for a computer it is a wall of text. How can we use graphs and NLP to help our machines make more sense of a story?Our example comes from the A Song of Ice and Fire books, aka Game of Thrones. We converted the e-books (epub) to text-files and used a small python program to split them into chapters, paragraphs, and sentences.So a book turned into this model :GraphAware NLPGraphAware NLP Framework is a project that integrates NLP processing capabilities available in several software packages like Stanford NLP and OpenNLP, existing data sources,...

In our previous blog postwe introduced the concept of Graph Aided Search. It refers to a personalised user experience during search where theresults are customised for each user based on information gathered about them (likes, friends, clicks, buying history, etc.).This information is stored in a graph database and processed using machine learning and/or graph analysis algorithms.A simple example is the LinkedIn search functionality. If we were typing “Michal” in the text input, it would obviouslyreturn people where the name matches and order them by full text relevancy with some fuzziness:Lucene-based search engines such as Elasticsearch and Solr offer impressive performance...

Iterating over large numbers of nodes using Cypher is quite a common use case in Neo4j. Typically, the reason for doing thisis that we want to perform some kind of operation for each one of these nodes. In this blog post, we will use one millionTestNodes and try to iterate over them in order to index their contents into a freshly created Elasticsearch index.There are three approaches we can take, two of which are quite common, but the most performant technique is largely unknown.First Technique : SKIP and LIMITUsing SKIP and LIMIT is the first approach that comes to mind,...

Recently, Neo Technology announced the 2.3.0-RC1 release of their Neo4j graph database. One of the key new features is TriadicSelection built into Cypher’s Cost Based Planner. In this blog post, we will explore the Triadic Selection in detailand demonstrate how significantly it can speed up recommendations computed in Neo4j.What is Triadic Selection?A Bit of Theory: Triadic ClosureNetworks or graphs can rarely be considered static structures. On the contrary, often they seem to be ever-evolving objects.Any social network, for example, is often the most dynamic of graphs: at any moment, new relationships are created between existing nodes, other relationships vanish,new nodes...

In this blog post, we’ll demonstrate how to use variable length relationships (sometimes called “variable length paths”)in Cypher using examples. We will also see when zero length relationships can be useful.IntroductionLet’s start with the basics. For the sake of the blog post, our use case will be users that know other users. Userswrite blog posts modeled as linked lists:You can generate an example graph with the following link to a predefined Graphgen graph, oruse this Neo4j Console if you want to execute the queries whilst reading the blog post.Basic Relationships MatchingLet’s start with a basic query that will find a...

Last weekend, I came across a tweet announcing that Wikimedia released the dataset of the page clickstreamsfor February 2015. I found it interesting to download this dataset and see how people arrive on the Neo4j’s Wikipedia page.The data is quite simple; we have page entities that relate to other pages. A page can either be a Wikipedia page, ora non-Wikipedia page such as Google. Relationships can represent a user click from a Wikipedia page to another page, or a user searching on Google or Wikipedia. The number of times an event occurs is also provided in the dataset.Importing the DatasetYou...

A common question when planning and designing your Neo4j Graph Database is how to handle “flagged” entities. This couldinclude users that are active, blog posts that are published, news articles that have been read, etc.IntroductionIn the SQL world, you would typically create a a boolean|tinyint column; in Neo4j, the same can be achieved in thefollowing two ways: A flagged indexed property A dedicated labelHaving faced this design dilemma a number of times, we would like to share our experience with the twopresented possibilities and some Cypher query optimizations that will help you take a full advantage of a the graph...

With MERGE set to replace CREATE UNIQUEat some time, the behavior of MERGE can sometimes be tricky to understand.MERGEHere’s a summary of what MERGE does: It ensures that a pattern exists in the graph by creating it if it does not exist already It will not use partially existing patterns- it will attempt to match the entire pattern and create the entire pattern if missing When unique constraints are defined, MERGE expects to find at most one node that matches the pattern It also allows you to define what should happen based on whether data was created or matchedThe key...