Playing with Linq grouping: GroupByMany ?

Linq is a great technology to manage data directly from your .Net language.

One of its features is grouping. Many people understand grouping like it is defined in Sql. Linq is implementing grouping quite the same way. Let's discover this syntax and how to make consecutive groups easier.

Then we will show how to use WPF HierarchicalDataTemplate to expose the results in just a few lines.

Let's assume we have a collection of customer: 'customers'. You just have to use 'group by' to define groups among your data.

var q =
from c in db.Customers
group c by c.Country;

q then becomes an enumeration of groups (IQueryable<IGrouping<string, Customer>>).Each item of this enumeration defines a group (IGrouping<string, Cucstomer>).

As we can see in its definition, IGrouping is just adding a few things:

- the key of the group (country in our sample).- the items grouped by this common key. To retrieve these items, you have to browse the group which is an enumeration itself.

// Summary:// Represents a collection of objects that have a common key.//// Type parameters:// TKey:// The type of the key of the System.Linq.IGrouping<TKey,TElement>.//// TElement:// The type of the values in the System.Linq.IGrouping<TKey,TElement>.publicinterface IGrouping<TKey, TElement> : IEnumerable<TElement>, IEnumerable
{// Summary:// Gets the key of the System.Linq.IGrouping<TKey,TElement>.//// Returns:// The key of the System.Linq.IGrouping<TKey,TElement>. TKey Key { get; }
}

Most of the time, we are using groups to retrieve aggregations like sum or count.To do this using Linq you just have to build a new Linq query on our first group query.

To simplify this syntax, you can use the 'into' keyword and then make disappear the nested query. Do not forget the first syntax that makes more visible why 'c' is not reachable after the group statement. Like in Sql, once the data are grouped, you can only select properties from the group.

Now, let's try to create child groups inside this query. The goal is simple: I would like to group customers by Countries then by Cities inside each group. This is quite the same scenario than what we are doing when using pivot table in excel.

We can write it 'manually' nesting a second Linq query into the result of our first query:

var q =
from c in db.Customers
group c by c.Country into g
select new {
g.Key,
Count = g.Count(),
SubGroups = from c in g
group c by c.City into g2
select g2};

The result is a tree of items grouped in a first level of countries groups and then each country group has a SubGroups property that stores the group of cities contained in each country.

Writing this will become less and less readable when the number of child groups will grow.Moreover, it's quite hard to factorize this code as we have to insert a new query inside the last projection.I wanted to make this scenario more simple and more generic. Here is the idea.

The first thing I have done is to create a fixed type to define a group.This allows me to have a returnable type (anonymous types are not), so I can isolate my code in a method. Moreover, as I will use my method recursively, actually I had no choice!Another reason was having the fixed non generic type GroupResult makes it easier for me to use WPF data binding (xaml does not support generic types).

Ok, now let's write the main work. The GroupByMany method is extending IEnumerable<T> just like GroupBy does but you can add an undefined number of group selectors (params Func<TElement, TKey>[] groupSelectors).

If the number of group selectors is zero, then the method returns null. It's also what will stop the recursivity in the case of multiple selectors.

If the number of group selectors is greater than zero, then I isolate the first one and build a simple Linq GroupBy query using it. Each returned item is a GroupResult and I am calling recursively the GroupByMany method on the results of the group (g) to fill the SubGroups property. When calling this method, I am using the remaining unused group selectors, which will finish by being empty.

The calling code is short and easy to read and the GroupByMany method is factorized enough to use it in many other cases.

Last step, let's try to display the result. WPF has a feature that I love: the possibility to associate a template to a data type. Usually, a template is stored in the resources and is indexed with a key, then the controls reference this template. Using the 'DataType' syntax without key, the template is automatically associated to any content control when the type of the content is corresponding to the DataType of the template.

The HierarchicalDataTemplate is a special template that allows you to define a collection of children (ItemsSource property) in addition to the regular DataTemplate definition.

Some hierarchical controls like the TreeView are using this template to build their structure recursively. So we have nothing more to do to display our multiple groupby results than connecting them to the treeview:

In addition to the treeview, I have added a simple ListView to display the customers belonging to the current selected group.

I let you evaluate the size of the code if you had to write the same program using Windows Forms and ADO.Net...

The source code attached is for Visual Studio 2008. Even if this sample is using Linq to object grouping, I have used a local Northwind database to populate my collection. You just need to modify the connection string in the app.config file to run the sample (or use any of your data sources of course).

I ran the linq query posted above using RTM linq to sql, but it results in a LOT of queries. This is logical: the main GROUP BY query ran on the DB can’t contain ‘City’ in the projection list unless it’s in the group by clause as well. So this isn’t fetched. The nested group by requires ‘City’ so this is fetched PER CUSTOMER from the db.

Instead, you should group the customers on Country AND City using an anonymous type. Though the nested group by then becomes more difficult at first glance.

Linq to Sql can’t merge parent/child resultsets inside a projection at runtime, so you won’t get this efficiently using Linq to Sql in its current state.

Wow, this is a very timely and informative post. I work on a product that was built pre-Linq that has a GetTree method that builds a hierarchy of TreeGroup objects that look very similar to your GroupResult class. I was wondering this past weekend how I could accomplish the same thing in Linq and your post showed me exactly what I need to do. Thanks!

Can anyone convert this to VB? I’m having trouble converting the lambda function correctly (I know VB uses the Function keyword, but I’m not sure how to use it in the context of this function. Any ideas?

This works fine and gives the expected results. However, if I replace the g.Sum(….) with funcQuantity (which of course is identical) the query fails with a NotSupportedException – "Unsupported overload used for query operator ‘Sum’.

You must understand that Linq to Sql is analyzing the code (an expression tree) to translate it to a sql query. So Linq to Sql is only recognizing what Linq defines (actually, Linq to sql recognizes a subset of Linq methods). Your query fails here because funcQuantity can not be translated into slq by Linq to Sql.

Depending on what you’re using this for, it can be quite useful to strongly type your results. Keep in mind that this only works for a single Key type (just as the solution above does). It only takes a few small changes to do so: (hopefully this is displayed well…)

After re-reading a little of your post, I see you mentioned that xaml does not support generic types. I suppose my previous code probably won’t work for anyone using this for WPF, but for others it might be useful.

Yes, xaml does not support generic types. Of course WPF does !! What you just cannot do is using the type name inside the xaml syntax.

Regarding your interesting proposition, it seems that there is a problem. In your solution TKey is inferred from ‘params Func<TElement, TKey>[] groupSelectors’. This means that all the group definitions are returning the same type. So you could write:

q.GroupByMany(c => c.Country, c => c.City) because Country and City are both strings but you could not write

Wow Mitsu, Thank you so much for your post. I’ve found both to be very enlightening and informative. As I learn more about LINQ and Lambda expressions, articles like these really showcase the power behind these new features and what is now possible. Thanks again.