One of the aspects of my job that I love is the week-long proof of concept boot camps. What it entails is me (or one of my team members) coming onsite to work with your team to build out a POC in just one week. They all vary somewhat, but I try to stick to a formula that works for me. I spend the first day with the whole team ironing out the Model. This is the trickiest part to get right because if the model is right, the queries will fall right into place. If the model has to be changed significantly on Day 3, let’s say, then a ton of work has to be redone or at least greatly modified. The goal of the end of day one is to have something that looks like the following:

This is the model that I will be using for this POC. At least at this point, there may be some changes along the way. There are a lot of discussions and alternatives hidden in that model. If you want to see an example of a model emerging, take a look at my modeling airline flights blog post. Why did I choose to split the POST relationships by date, but not the FOLLOWS relationship? To answer this question, it is important to look at both the data we have and the questions we are going to ask.

We are lucky that we kinda know how Twitter works already so we’re not going at a brand new application. We can say with some certainty that optimizing for reading the timeline quickly is more important than getting a list of recent followers quickly.

We have two types of users on Twitter. The average user who follows and is followed by a few thousand users at most… and the celebrities who have millions of followers. For both types of users, getting their timeline is very important. We can traverse the graph from the User to the Users they FOLLOW by a single relationship type, and then out to that day's Posts. If we had split the FOLLOWS relationships, we’d have to work a little harder every single time searching for all the people we FOLLOW — and that work would get worse as time progressed.

Having a single FOLLOWS relationship also enables us to use getDegree for both incoming and outgoing directions. The celebrities on Twitter are less concerned with who follows them, but very concerned about the number of followers as a measure of their popularity. One thing we can do to appease the occasional celebrity curiosity of seeing who follows them is to keep track of the last few followers in an array and grabbing them directly instead of following the FOLLOWS relationship. We will consider that an optimization and exclude it from this POC for now.

Once we have our model, what I like to do next is build an API. In a typical POC boot camp, we may be asked to solve a single very complex or time sensitive traversal, or we may be asked to help build a Minimum Viable Product. In that short time, the most we can really do is about 20 HTTP endpoints. I like the HTTP API approach because it allows Neo4j to come in as a Service rather than just a database. A service that plugs right into any modern architecture using standard HTTP to communicate between services. It doesn’t matter if the clients are in Java, .NET, Ruby, or any other language, they all have robust mature HTTP libraries. For this POC, the HTTP API I came up with looks like this:

I like the HTTP API approach because if for some reason, Neo4j isn’t the right fit at least the customer has API to show for it which they can try to implement in some other technology. I mentioned in Part I that all the source code is already available. So, while I may cut and paste chunks to explain what I did, you can also follow the complete source for clarity. Keep in mind that some of it may change as we go further into this series.

Coveralls makes sure I actually write tests. Like many of us that came from the Ruby on Rails community, I owe any testing I do in great part to Gregg Pollack and Jason Seifer. I just wanted to say thank you and pay my respects.

Let’s get started with our first class, Users. The application is going to want to authenticate users, so let’s write our createUser method. This will be a POST action call with a JSON payload that we need to validate. Assuming it is good, we will see if the username or email are not already taken. Next, we will create the node and assign it the properties from our JSON payload, adding an MD5 Hash property of their lowercased email address. I know MD5 is obsolete as a cryptographic function, but it is used by Gravatar to provide image services for our users.

Nothing too crazy here. Getting our user is also pretty simple. The findUser method used below either finds the user by their username or throws an exception. Once the user is found, we just want all their properties sent back including their password.

We’ll want a different method for the public information about a user. That is where getProfile comes in. In our getProfile method we will follow the exact same steps as getUser but instead of returning all Properties, we will remove EMAIL and PASSWORD as well as compute some statistics about their relationships. Those node::getDegree operations are very fast because Neo4j pre-computes and stores the answer when relationships are added instead of counting every time. The only tricky one is “posts” since those are “Dated Relationship Types,” but we can use the degrees of other relationship types to calculate the right answer.

Part of our CreateUserTest can be seen below. We resolve our URI and send a JSON payload as our input, making sure our actual results matched the expected outcome. The hash string is the only surprise as it is added when the User is created. We will not be storing passwords in plain text. Our front end application will send us encrypted strings that look more like $2a$10$8znG1WqeXOy9t/5aet62LuHnO5tv.dxtn3T2uAoA5T5gIxlOovnKW, so don’t worry. But I am getting ahead of myself.