To do that query on Neo4j, you would need to store in memory on one machine the entire Twitter social graph, all the people who tweeted every URL ever tweeted on Twitter, and then do the computation on a single thread. Neo4j can't handle that scale.

The reach computation on Storm does everything in parallel (across however many machines you need to scale the computation) and gets data using distributed key/value databases (Riak, in our case).

We used Neo4j over a year ago, and it was pretty unstable when we used it. The database files were getting corrupted pretty frequently (a few times a week), so it just didn't work out for us. Ultimately it was for a small feature, so rather than continue to struggle with Neo4j we just reimplemented the feature using Sphinx. Like I said, that was a long time ago and Neo4j may have gotten a lot better since then.