Introduction

Have you ever wanted to execute simple SQL SELECT statements against in-memory collections and lists? Well, I have, and this article presents a mechanism to do that utilizing LINQ and runtime generated and compiled Lambda Expressions. The result is support for runtime query evaluations like:

var result = source.Query<Person, Tuple<string, double>>(
"SELECT Address, Avg(Age) FROM this GROUP BY Address");
var result2 = source.Query<Person, Family>(
"SELECT Address, Avg(Age) AS AverageAge FROM this GROUP BY Address");
var result3 = source.Query<Person>("SELECT * FROM this ORDER BY age");
var result4 = source.Query<Person, string>("SELECT DISTINCT address FROM this")

I'm a big fan of LINQ, especially LINQ to Objects. Code that might be tens of lines long using a foreach iteration and evaluation can often be shortened to 1 or 2 lines. Anything that reduces code lines is a big plus for maintainability and quality. As I've used LINQ more and more in both production and hobby code, I've found that while the Enumerable extension methods are pretty easy to follow and work with, I continue to trip on the "almost SQL" inline LINQ C# keywords. Perhaps I've been at this too long, but my fingers just won't start a query statement with any other word than SELECT.

My personal short comings aside, I've found that LINQ has a more practical limitation, namely those "almost SQL" statements are still tightly coupled to an application's static structure. I was a big fan of the VARIANT back in the day, and have always thought that IDispatchEx never really got a chance to show its real potential outside of Microsoft apps before COM went out of style. Dynamic typing has a lot of advantages especially in increasingly large, complex, and distributed systems. Perhaps, I should switch to Python, but C# pays the bills.

Luckily, Microsoft has been adding dynamic typing features to .NET and C#. C#/.NET 4.0 is adding some interesting features: the F# Tuple type has been made part of the BCL; C# gets a dynamic keyword and access to the Dynamic Language Runtime, and the BCL gets an ExpandoObject allowing even statically typed languages like VB.NET and C# to take on some features of a Duck Typed language. The combination of dynamic and static typing within C# may be a powerful new addition, or it might end up being a Frankenstein with the worst features of both approaches. It will be interesting to see how all of this plays out over time.

But I digress. The real reason for this article is that I've always wanted to write a runtime evaluation engine. .NET 3.5 (with the addition of System.Linq.Expressions) and the dynamic typing features of C# 4.0 have provided the right set of tools. So I gave it a whirl.

var result = source.Query<Person, dynamic>("SELECT Address, Avg(Age) AS AverageAge
FROM this GROUP BY Address HAVING AverageAge > 40")

Why would you want such a thing after Microsoft went to all that trouble creating Linq? Well most of the time you don't. Linq is awesome for the majority of applications where the data model is static and the view the application needs of it changes only slowly over time. It's for those cases where the data model is dynamic or the apps view of the data cannot be defined at compile time that I started poking around with this idea.

Background

The basis for this article begins with a previous installment: Taking LINQ to SQL in the Other Direction. That article describes the basic parsing and evaluation infrastructure used. Much of the underlying code is the same (or at least started it out the same) especially in the area of SQL parsing using the GOLD parser. A limitation in that previous code was that the data being evaluated had to be in the form of an IDictionary<string, object> and that was also how the data was returned to the caller.

It also owes some inspiration to the Dynamic LINQ example that Microsoft published as part of the VS2008 Samples collection.

Also, the this entry in the join chain portion of the FROM clause is merely a placeholder. It refers to the same this that is passed to the extension methods or the IEnumerable<T> passed to the Evaluate methods. In a future update, I'd like to add support for joins across multiple collections, but at the moment, that capability isn't present.

Points of Interest

The most challenging part of this was creating result selectors. SQL return types are polymorphic depending on the query. A SELECT query can return a subset of the input data without a type transformation, with queries such as SELECT * FROM this WHERE Age = 40. It can return a subset of the fields from the input data type: SELECT name, age FROM this. It can return a single value in situations like SELECT Avg(Age) FROM this. Or queries can return completely transformed types that are aggregations of the input data: SELECT name, Avg(age) FROM this GROUP BY name.

The type parameters passed to the Query methods indicate both the type contained in the enumerable and the type to create and return.

Single Properties

For selecting single properties from the source type, a lambda is created that returns the value of that property from each source object. This doesn't require the creation and initialization of new objects:

Multiple Properties

Returning multiple properties requires creating new instances of TResult and populating them with the result data. There are three approaches to doing this.

In most cases, it is expected that the return type has read/write properties for each of the fields in the SELECT statement. In this case, if the source property name is not the same as the result property, the AS keyword can be used to map the two. In the example below, the Person class has a property named Address, while OtherPerson has Location:

As a side note: all property names are evaluated in a case insensitive fashion. If TSource or TResult has Property and property, exceptions will be thrown.

A special selector is created for the Tuple type which has a constructor that takes an argument for each constituent property. The order of fields in the Select statement must match the order of arguments in the constructor declaration.

Dynamic and Expando

Another special selector is created when dynamic is specified as TResult. In this case, you will always get a collection of ExpandoObjects back. ExpandoObject implements IDictionary<string, object> to store the set of dynamically assigned properties, and this interface is used to populate the return objects. It is via this mechanism that this API goes from statically to dynamically typed. Something I noticed about the ExpandoObject is that its property names are case sensitive. I don't know if that's good or bad, but for some reason, I expected them not to be, since it would seem to mesh more with a dynamically typed environment.

Grouping

By far the most challenging selector was GROUP BY. This involves not only a type transformation between TSource and TResult but the calculation of properties on the return type as opposed to only assignment. It took me a while to wrap my head around how to do this without calculating and caching some intermediate state for each aggregate. In the end, I created a type, GroupByCall, to cache the delegate for each aggregate at compile time for later invocation during evaluation.

Joining

The updated code also supports simple inner joins between to IEnumerables. Given that the outer enumerable needed to be identified in the SQL test I invented a "that" keyword. Perhaps a little hokey but it satisfied my curiosity that this could be done.

Conclusion

The attached unit tests contain plenty of examples for different combinations of the above, but most of the basic syntax of SQL SELECT should work. I've had a lot of fun with this code thus far, and am planning on updating it with additional functionality. Hopefully, you find it useful or at the very least interesting.

History

Initial upload - 11/11/2009.

Added joins, bug fixes, misc other stuff - 5/8/2013

Fixed some bugs with nullable type comparisons - 5/25/2013

Added ability to use ExpandoObject as enumerable source and join target - 8/2/2013

In your example what is the type of db? It looks like it might be an ADO connection or somesuch.

The query methods in the article all operate on IEnumerable<T> objects. So the code isn't meant to interface directly with a database (there are plenty of tools for that) but rather to refine, subquery, or transform collections of objects (which may or may not have been ultimately retrieved from a relational or any other data store).