Relationships in BigTable, or “How to put extra effort into being lazy”

Executive Summary: We’ve had to re-think some of the relationships between our objects with BigTable and, in some cases, reverse them.

One thing you can never accuse the hillbilly of is proper use of prepositions. Another thing of which you can never accuse the hillbilly is that he is not lazy. (Nor can you not accuse his use of double negatives of not being interesting.)

NHibernate has made me lazy about loading things. It’s also made me lazy about querying things. But in the BigTable world of AppEngine, I’ve had to actually think about both of these topics.

In our domain, a StaffMember has a collection of exactly seven DayAvailability objects, each one representing his or her availability on a given day of the week. I’d show you a nice little class diagram but I’m still trying to maintain a base level of laziness here. And this blog post is increasingly not helping.

Because it’s BigTable (which may be a NoSQL database if I had the gumption to look up what that exactly means rather than making assumptions based on the words involved), the collection of seven DayAvailability objects is stored with the staff member. I.e. I load the StaffMember, I get the DayAvailibility objects.

This is fine and dandy for our staff edit screen where we want to display the availability of the staff member. It’s also okay for our “add appointment” screen where we want to make sure that the appointment time falls within the staff’s availability.

Now let’s add a new scenario: A customer wants to book a massage at 2pm on Saturday. We open the book appointment screen and notice there are no slots available. The customer says “When’s the earliest after 2 I can get in?” We click on the handy new “find next available” link…

There are a couple of ways to implement this. Let’s try a naive approach where the system advances every half hour and checks to see if anyone is available that can perform the service. There are a few factors to consider but one of those factors is which staff members are available at 2:30.

In order to perform this query, we need to load each staff member and look through his or her availability. When it would be much easier to do a search like “Find an availability interval that includes 2:30 and return the staff member for it.”

Because of issues like these, we often have to rethink the relationship between objects. StaffMembers no longer carry a collection of DayAvailability objects. Rather, the DayAvailability object has a reference back to the staff member so that we can perform queries like the one above.

This affects the staff edit page because now we need an extra query. When we load the data for the page, we first get the StaffMember, then we query for all DayAvailability objects that refer to that StaffMember.

Basically, If you look closely, we’ve implemented a poor man’s lazy loading. The StaffMember now holds only the basic information. When we need its availability, we query for it.

After coming to terms with this, I decided it was actually a good thing. In many cases, we don’t need the staff member’s availability. So we don’t need to bandy it about like a wounded badger.

Each option is going to have strengths and weaknesses. My point’s not really about supporting a model that could be switched (easily) from one technology to the other, that’s a pipe-dream. More along the lines of that the responsibility of searching for a customer can be in-of-itself separate from the underlying technology you use to abstract your object model from persistence.

For example if you’re using an RDBMS then leverage the power of queries, create a stored procedure to determine an available timeslot. It just returns the timeslot datetimes/ids and staff IDs. The ORM takes those and either eager or lazy-loads them to display. A NoSQL solution might hand-off to a Lucene Index or something, that isn’t my barrel of fish to be shooting at.

But even this has trade-offs. Business logic is now in more than one spot. You can’t just add a unit test to your existing test suite to test C#/java/ etc. because the logic to find a timeslot is now in TSQL/PSQL against physical data. Oh well, mark it up for an integration test.

You haven’t seen pain until you’ve looked at larger Linq2SQL “solutions”. The road to Hell is paved one stone at a time, technologies like this replace stones with 1km long sections of pre-paved goodness to get you to your destination oh-so-much-quicker.

Wow, I’ve posted dozens of comments and this is the first time it’s asked me for (the) “clock”…

remi bourgarel

So you’re saying that in a command/patterns architecture, UI and your DTOs are dependant ? Or at least influence each other (maybe more in the way UI -> DTOs).

Waiting for your blog posts

http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

That’s an excellent point and has been a half-written blog post for a few months now.

Technically, the change was made to accommodate a new business rule: find next available appointment. I described in terms of the UI because that helps me visualize things better.

That’s all well and good for letting me sleep at night but we make changes to the domain model all the time to accommodate UI changes. And I’m fine with that because I think it’s a consequence of our architecture, which is based exclusively around commands/patterns. Calling it a domain model is my first problem because really, it’s just a bunch of DTOs. All the real work happens in the commands.

Should finish up that blog post, methinks

remi bourgarel

I thought that we shouldn’t design our business class after the UI but after the business specifications. And here you changed : your business layer and your data access layer because you changed an user interface.

http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

That could be. It’s also a core use case for having different data stores that are all eventually consistent.

In any case, the choice of BigTable certainly wasn’t so we could jump on the NoSQL bandwagon. Data access ranks just below security in my list of “things I’d rather someone else dealt with”.

http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

I think that’s a higher-up issue. Probably not obvious from my description, but we already have a timeslot finder. This post was about how we implemented it (sort of), first with the naive approach of looking through every staff member, then by breaking out the availability into first class objects.

That said, I think there are fundamental differences between a SQL data store and a NoSQL (or at least a BigTable) data store that will dictate how you write your data access. If you use an ORM that supports lazy loading inherently, I think you’d design your object model differently than if you had to do it manually. Especially with a Google Web Toolkit app that makes heavy use of commands for all of its operations.

http://www.infovark.com/ Dean Thrasher

Why, I do believe you’ve rediscovered the joys of database normalization! Those crazy NoSql kids with their document databases might say otherwise, but as soon as you need to start slicing the same data from a different angle, you’ve arrived at the core use case for a good ol’ fashioned relational store.

Steve Py

When you buy a new hammer, everything looks like a nail.

What happened to OOP? Just because you use an ORM doesn’t mean you have to do everything through said ORM, and doesn’t mean you cannot use a query.

I am a timeslot finder! Give me a start time and I will return you the first 5 available timeslots and the ID of the staff members each belongs to.
I am an SQLServer timeslot finder, he is the NoSQL timeslot finder, which one do you want to use? You like this one? give that ID to the ORM.