The blog post explains an really interesting approach by Kavita Ganesan which uses a graph representation of sentences of review content to extract the most significant statements about a product.

Each word of the sentence is represented by a shared node in the graph with order of words being reflected by relationships pointing to the next word, which carries the sentence-id and a positional information of the leading word.

By just looking at the graph structure, it turns out that the most significant statements (positive or negative) are repeated across many reviews.
Differences in formulation or inserted fill words only affect the graph structure minimally but reinforce it for the parts where they overlap.

I always joked that you could create this graph representation without programming just by writing a simple Cypher statement, but I actually never tried.

Until now, and to be honest I’m impressed how easy it was to write down the essence and then extend and expand the statement until it covered a large number of inputs.

The essence of creating the graph can be formulated as: “Each word of the sentence is represented by a shared node in the graph with order of words being reflected by relationships pointing to the next word”.

I want to filter out stop words

Filter the words after splitting and trimming by checking against a collection with `IN`

with "Great device, but the calls drop too frequently." as text
with replace(replace(tolower(text),".",""),",","") as normalized
with [w in split(normalized," ") | trim(w)] as words
with [w in words WHERE NOT w IN ["the","an","on"]] as words
UNWIND range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
MERGE (w2:Word {name:words[idx+1]})
MERGE (w1)-[:NEXT]->(w2)

Cleanup

match (n) optional match (n)-[r]-() delete n,r

I want to load the text from a file

LOAD CSV loads each row as array of strings (when not used with a header row), using the provided field terminator (comma by default).
If we choose a full stop as a field terminator, it actually splits on sentence ends (mostly).
So we can just unwind each row into it’s cells (text fragments) and then treat each of those as we did a piece of text before.

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.

[...] specially Ph.D. students. One trick is to search for “graph based approach to” and your problem.Natural Language Analytics made simple and visual with Neo4jI was really impressed by this blog post on Summarizing Opinions with a Graph from Max. The blog [...]

This is freaking awesome. I’m working on a huge translation project planning about 6-7 languages. What are the possibilities for translation? Or foreign language learning programming?
The implications are enormous.