Friday, August 08, 2008

LINQ has revolutionised the way we do data access. Being able to fluently describe queries in C# means that you never have to write a single line of SQL again. Of course LINQ isn't the only game in town. NHibernate has a rich API for describing queries as do most mature ORM tools. But to be a player in the .NET ORM game you simply have to provide a LINQ IQueryable API. It's been really nice to see the NHibernate-to-LINQ project take off and apparently LLBLGen Pro has an excellent LINQ implementation too.

Now that we can write our queries in C# it should mean that we can have completely DRY business logic. No more duplicate rules, one set in SQL, the other in the domain classes. But there's a problem: LINQ doesn't understand IL. If you write a query that includes a property or method, LINQ-to-SQL can't turn the logic encapsulated by it into a SQL statement.

To illustrate the problem take this simple schema for an order:

Let's use the LINQ-to-SQL designer to create some classes:

Now lets create a 'Total' property for the order that calculates the total by summing the order lines' quantities times their product's price.

LINQ-to-SQL doesn't know anything about the Total property, so it does as much as it can. It loads the Order. When the Total property executes, OrderLines is evaluated which causes the order lines to be loaded with a single select statement. Next each Product property of each OrderLine is evaluated in turn causing each Product to be selected individually. So we've had five SQL statements executed and the entire Order object graph loaded into memory just to find out the order total. Yes of course we could add data load options to eagerly load the entire object graph with one query, but we would still end up with the entire object graph in memory. If all we wanted was the order total this is very inefficient.

Now, if we construct a query where we explicitly ask for the sum of order line quantities times product prices, like this:

One SQL statement has been created that returns a scalar value for the total. Much better. But now we've got duplicate business logic. We have definition of the order total calculation in the Total property of Order and another in the our query.

So what's the solution?

What we need is a way of creating our business logic in a single place that we can use in both our domain properties and in our queries. This brings me to two guys who have done some excellent work in trying to solve this problem: Fredrik Kalseth and Luke Marshall. I'm going to show you Luke's solution which is detailed in thisseries of blogposts.

It's based on the specification pattern. If you've not come across this before, Ian Cooper has a great description here. The idea with specifications is that you factor out your domain business logic into small composable classes. You can then test small bits of business logic in isolation and then compose them to create more complex rules; because we all know that rules rely on rules :)

The neat trick is to implement the specification as a lambda expression that can be executed against in-memory object graphs or inserted into an expression tree to be compiled into SQL.

Here's our Total property as a specification, or as Luke calls it, QueryProperty.

We factored out the Total calculation into a specification called TotalProperty which passes the rule into the constructor of the QueryProperty base class. We also have a static instance of the TotalProperty specification. This is simply for performance reasons and acts a specification cache. Then in the Total property getter we ask the specification to calculate its value for the current instance.

Note that the Total property is decorated with a QueryPropertyAttribute. This is so that the custom query provider can recognise that this property also supplies a lambda expression via its specification, which is the type specified in the attribute constructor. This is the main weakness of this approach because there's an obvious error waiting to happen. The type passed in the QueryPropertyAttribute has to match the type of the specification. It's also very invasive since we have various bits of the framework (QueryProperty, QueryPropertyAttribute) surfacing in our domain code.

These days simply everyone has a generic repository and Luke is no different. His repository chains a custom query provider before the LINQ-to-SQL query provider that knows how to insert the specification expressions into the expression tree. We can use the repository like this:

Note how the LINQ expression is exactly the same as one we ran above which caused five select statements to be executed and the entire Order object graph to be loaded into memory. When we run this new test we get this SQL:

I was thinking about it a couple of days ago.That probably sounds far fetched but what about a mechanism where when a LINQ query cannot be entirely converted into SQL that compiled code portions are sent to SQL server for execution there ? I know it's stupid, there are probably more situations where this isn't feasible than situations where it is but wouldn't make a compelling reason to have CLR from within SQL Server?

Code Rant

Notepad, thoughts out loud, learning in public, misunderstandings, mistakes. undiluted opinions. I'm Mike Hadlow, an itinerant developer. I live (and try to work in) Brighton on the south coast of England. Please don't mistake me for an expert in anything. I love technology and programming, but make no claims to be any good at it. Much of what you read here may be poorly thought out, wrong, or just plain dangerous.