LINQ to Tree - A Generic Technique for Querying Tree-like Structures

This article presents a generic approach to applying LINQ queries to tree like structures. Using T4 templates for code generation, LINQ to VisualTree (WPF), LINQ to WinForms, and LINQ to FileSystem APIs are constructed.

This article looks at how LINQ can be applied to tree-like data, using LINQ to XML as an example. A generic approach to querying trees is constructed, which allows any tree structure to be made LINQ 'compatible' with the implementation of a simple adapter class. This idea is extended further with T4 templates for code generation, with examples of LINQ to VisualTree (WPF / Silverlight), LINQ to WinForms, and LINQ to Filesystem presented.

One of my favorite features of the C# programming language is LINQ. As a programmer, so much of what we do is write code to filter, manipulate, and transform data. LINQ brings with it a whole host of functionality that allows you to perform these tasks using a much more expressive and succinct language. It has clearly played a big part in the development of the language itself, bringing with it Extension Methods, yield, and Lambda Expressions.

When I started learning WPF, I thought it might be useful to apply LINQ to the visual tree in order to query the state of the user interface (UI). For those of you not familiar with WPF, the visual tree is the hierarchical structure of elements, borders, grids, etc... that are used to build the UI. These are typically defined in XAML, i.e., XML, which is itself a hierarchical structure. The problem is, LINQ depends on the IEnumerable interface, which is used to define a 1-dimensional array of items. How can this be used to query a hierarchical structure like the WPF visual tree?

After a quick Google search, I stumbled upon Peter McGrattan's LINQ to Visual Tree implementation, which simply flattens the tree into a 1-dimensional structure. Elsewhere, I also found a more generic implementation by David Jade which can be used to flatten any tree in order to apply LINQ queries. However, in flattening the tree, you are losing useful information that might be relevant to your query. Neither were they the solution I was looking for!

Anyhow, I forgot about this problem for a while until more recently when I started a new project where we were working with a lot of data in XML, and naturally, we turned to LINQ to XML...

This section gives a very brief introduction to the LINQ to XML API and its relationship to XPath. If you are already a LINQ to XML guru, feel free to skip it and get onto the juicy bits!

The LINQ to XML API provides methods for querying XML documents which are loaded into memory in the form of an XDocument (or X-Dom for short). The in-memory XML document consists of a tree-like structure of nodes. At each node, there are LINQ methods available for searching the tree in order to find children, siblings, or nodes with some other relationship to the current node (context) that matches the various criteria.

A simple example, which queries the following example XML, is given below, based on the following XML instance document:

The following query will find all 'product' elements with a colour child element with a value 'Red':

The nodes matched by each part of the query are highlighted in the XML document above.

The first part of the query starts at the root of the document, invoking the XDocument.Descendants method to find any elements with the name order which are descendants (direct or otherwise) within the tree.

The second part of the query matches all the product descendants of order elements; however, this is where things start to get interesting. Whilst the first Descendants method in the above query is defined on XDocument, the second Descendants method that follows it is not. This method is defined on IEnumerable<XElement>, and is the result of invoking the Elements method on each of the XElements within the collection returned by the preceding Descendant query.

LINQ to XML defines a corresponding 'IEnumerable' extension method for each of the regular query methods on a single XElement (or XDocument). It is these methods that give LINQ to XML its real power, allowing the queries to 'spider' throughout the document.

The final part of the query filters the product elements. It finds the colour elements, which is a direct descendant, finding those with a value of Red.

The LINQ to XML API is powerful and expressive, but how were these query methods, that match sibling, ancestor, descendant elements selected? For this, the LINQ to XML developers borrowed from an existing technology for querying XML, that of XPath. The equivalent XPath query to the one given above in LINQ is:

//order//product[colour='Red']

XPath allows you to create a query by assembling a number of expressions and predicates, each one acting on the node-set which is the result of applying the preceding expressions. This is analogous to the LINQ to XML execution described above.

With XPath, at a specific node in the XML document, you have the option to query a number of different 'axes', each one defining a set of nodes relating to the current context node. This allows you to define queries that traverse the document in a number of different directions. The axes that are common to both LINQ and XPath are summarised below:

LINQ to XML

XPath

Illustration

Descendants

descendant (//)

Ancestors

ancestor

Elements

child (/)

ElementsBeforeSelf

preceding-sibling

ElementsAfterSelf

following-sibling

Note: In the above diagrams, the code highlighted in red is the current context. The nodes highlighted in green are the ones that are members of the given axis. Also, XPath has a corresponding 'or-self' axis for each of those given above, which includes the context node; these are also present in LINQ to XML as 'AndSelf' methods.

LINQ to XML borrows the axes defined by XPath for querying the XML tree. Why not use these same axes for querying other tree-like structures?

The problem is, there is no common interface for defining a tree structure; for example, the WPF visual tree and filesystem APIs have very different interfaces. A common solution to this problem is to apply the Gang of Four Adapter Pattern, where you define the interface that you really want, then 'adapt' the existing classes so that they implement this interface, typically by 'wrapping' them in another class.

In order to traverse a tree structure, at each node, all you need are methods to navigate to the parent or children. The following generic interface provides methods for traversing a tree of objects of type 'T':

///<summary>/// Defines an adapter that must be implemented in order to use the LinqToTree
/// extension methods
///</summary>///<typeparamname="T"></typeparam>publicinterface ILinqToTree<T>
{
///<summary>/// Obtains all the children of the Item.
///</summary>///<returns></returns> IEnumerable<ILinqToTree<T>> Children();
///<summary>/// The parent of the Item.
///</summary> ILinqToTree<T> Parent { get; }
///<summary>/// The item being adapted.
///</summary> T Item { get; }
}

This interface then allows us to create extension methods for navigating the different axes for a specific node of type 'T'.

(I have omitted the 'AndSelf' implementations; these are quite trivial; if you are interested, download the sample code for this article.)

However, as we saw in the LINQ to XML section above, these methods allow us to navigate from one node to the nodes within its related axes; however, it is the IEnumerable extension methods that provide the real power of LINQ to XML, allowing your query to 'spider' throughout the document.

The IEnumerable equivalents of the above axes methods are given below:

In the above code, you can clearly see the relationship between the methods that act on a single node and their IEnumerable equivalent which performs the same function on a collection of elements. In the above code, they are all implemented via the private DrillDown method, which applies a given function to each of the items within a collection, yielding the results.

An implementation of the ILinqToTree interface can be created to wrap any tree structure, allowing it to be queried using this API. To see how the ILinqToTree interface is used in practice, we will create an adapter that allows us to query the WPF visual tree:

The visual tree is composed of DependencyObject instances, and their relationships can be determined from the VisualTreeHelper. With the above adapter, we can now apply LINQ queries to our visual tree. The following is a simple example, where a Window with the following XAML markup is queried:

The following examples demonstrate a few queries executed against the visual tree. Each example is given in both LINQ query syntax and extension method (or fluent) syntax:

// get all the TextBox's which have a Grid as direct parentvar itemsFluent = new VisualTreeAdapter(this).Descendants()
.Where(i => i.Parent is Grid)
.Where(i => i.Item is TextBox)
.Select(i => i.Item);
var itemsQuery = from v innew VisualTreeAdapter(this).Descendants()
where v.Parent is Grid && v.Item is TextBox
select v.Item;
// get all the StackPanels that are within another StackPanel visual treevar items2Fluent = new VisualTreeAdapter(this).Descendants()
.Where(i => i.Item is StackPanel)
.Descendants()
.Where(i => i.Item is StackPanel)
.Select(i => i.Item);
var items2Query = from i in
(from v innew VisualTreeAdapter(this).Descendants()
where v.Item is StackPanel
select v).Descendants()
where i.Item is StackPanel
select i.Item;

Each of these queries follows a similar pattern. First, the root of the visual tree is wrapped in the VisualTreeAdapter (which implements ILinqToTree); this is followed by the query itself; finally, a Select clause is used to unwrap each of the items that satisfies our query criteria.

In my opinion, queries against tree structures are most often more readable in fluent / extension method syntax, the sub-selects required in query syntax are needlessly verbose. However, there are a few times when query syntax, or a mixture, provides the most readable query, as we will see later.

Whilst this mechanism works quite well, there are a few not-so-nice features. Firstly, the addition of our adapter means that we are continually wrapping and un-wrapping the nodes of our tree. A second issue is a little more subtle. It would be nice to add extension methods for the common tasks of filtering an axis by type; for example, allowing us to find the descendants of type 'TextBox'. The following should fit the bill:

The above method does indeed produce the desired outcome, filtering the descendant nodes for those with a type given by K. However, while the existing method has a single type parameter that can be inferred by the compiler, the method above has two type parameters, and inference no longer takes place. This means that we not only have to specify the type we are searching for, but also the type being queried, as illustrated below:

In order to remove the need to wrap / unwrap the nodes of our tree, the extension methods need to be defined on the node type itself. The problem is, if we define extension methods on our specific node type, this means that the LINQ to Tree API has to be manually constructed for each node / tree type. The solution I found to this problem was to 'internalise' the use of the adapter, generating the LINQ to Tree extension methods via T4 templates.

T4 templates are built in to Visual Studio, and provide a mechanism for generating code files via C# and a simple template markup. For an introduction to authoring T4 templates, try this article on CodeProject. The solution works as follows: there is a single T4 template file that contains our simplified ILinqToTree interface (there is no need for the Item property required for unwrapping, or the factory method), and the LINQ to Tree extension methods, plus their IEnumerable equivalents. This code is within a T4 template method that takes parameters which detail the type which the API is being generated for (e.g., DependencyObject, DirectoryInfo) and the class which implements the IlinqToTree interface for this type.

Here is the template (Note: I have only included the descendants axis in this code snippet; the other axes and their 'AndSelf' counterparts are all in the code associated with this article):

The code within the template is very similar to that given in the previous sections. The main difference being that the use of the ILinqToTree adapter is now scoped within each extension method so that the LINQ to Tree API operates directly on the type itself. This removes the need for the generic type parameters on the 'regular' methods, allowing us to use a single type parameter for searching the collection for a specific type.

From these few lines of code, the T4 template generates a ~400 line API that allows you to query the tree making use of all the XPath axes described earlier. For example, the generated Descendants method looks like this:

I am not going to reproduce the whole LINQ to Visual tree API in this article, the code itself is really not that interesting! (You can find the full API in the code attached to this article.) What is more interesting is that fact that we now have a generic LINQ API for tree structures that can be tailored for generating the API for a specific tree type in just a few lines of code. Before we explore its application to some other tree-like structures, let's revisit the queries that we applied earlier to the visual tree. With the new API, the queries are much more succinct:

// get all the TextBox's which have a Grid as direct parentvar itemsFluent = this.Descendants<TextBox>()
.Where(i => i.Ancestors().FirstOrDefault() is Grid);
var itemsQuery = from v inthis.Descendants<TextBox>()
where v.Ancestors().FirstOrDefault() is Grid
select v;
// get all the StackPanels that are within another StackPanel visual treevar items2Fluent = this.Descendants<StackPanel>()
.Descendants<StackPanel>();
var items2Query = from i in
(from v inthis.Descendants<StackPanel>()
select v).Descendants<StackPanel>()
select i;

Another commonly encountered tree structure is the hierarchical control structure within Windows Forms applications. This structure is not always that obvious to WinForms developers because it does not have the same significance that the visual tree structure has in WPF / Silverlight, where property inheritance is supported by the tree structure. However, you can see that the UI is indeed a tree, by switching on the Document Outline view:

Creating our LINQ to Windows Forms API is again simply a matter of creating our ILinqToTree adapter for a Control:

I have also added another extension method to DirectoryInfo to adapt the 'GetFiles' method which returns an array into something more LINQ friendly. For those of you familiar with XPath, files in this context can be seen as analogous to attributes in an XML document, and form another axis which can be queried.

Personally, I have a vast number of Visual Studio projects littered around my hard disk, the fruits of many ideas, some good and some bad, but all somehow lacking in organisation. I tend to keep the better ones in Subversion (SVN). The following query finds all the projects which I have added to SVN:

The first where clause in the query finds any directories that contain a '.svn' sub directory. The second where clause matches those which do not have a direct parent which itself contains a '.svn' directory (when you add a project to SVN, all folders will have a .svn folder added to them).

Here's another fun example that provides an interesting mix of query and fluent syntax:

The first part of the query, the 'in' clause, finds all bin directories and selects all of their descendant directories; the 'let' clause assigns the count of the number of XML files in each directory to a variable 'xmlCount'; the where clause selects those where the count is >0; finally, the selects clause creates an anonymous type containing the directory and the number of XML files. The output looks something like this:

This same query could, of course, be applied to any of the LINQ to Tree APIs generated in this article.

In conclusion, this article has demonstrated how to produce a LINQ API for any tree-like structure by the implementation of a simple adapter interface and the use of T4 templates for code generation. Hopefully, you have found this article interesting, and have perhaps learnt something new about the LINQ to XML APIs, or XPath axes.

If you manage to apply this technique to some other tree structure that I have not covered in this article, I would love to hear about it, please leave a comment below.

Enjoy!

History

3rd March 2010

Article updated, ILinqToTree interface modified based on feedback from William Kempf

I am also a Technical Evangelist at Scott Logic, a provider of bespoke financial software and consultancy for the retail and investment banking, stockbroking, asset management and hedge fund communities.

Very well thought out. Eminently useful. After I read this article a year or two ago, I rolled my own implementation with a number of additional interfaces that describe various tree forms and their idiosyncrasies (rooted trees, binary trees, ternary trees, n-ary trees, etc.). I can't tell you how often I use this extended tree library and the extension methods I wrote, which were inspired by your article. If you stare at a problem domain or data set hard enough, very often you will see some sort of tree structure to which your problem can be almost instantly refactored. This has resulted in amazing productivity gains for me, so thanks for your insights.

I was happy to see your excellent article. I posted a related topic two years ago: Non-intrusive Tree & Graph Types[^] which I just updated. Your article takes the subject to the next level. Thank you for posting it!

I have a few more comments after giving your article a more thorough read. Once again, great job and thanks for sharing. I wish I found this article earlier, but I was pointed this way just five days ago. (I should always be on the lookout, I know.)

You make a great point of the general applicability of Linq-to-XML's "axis" methods, which you clearly explain. Our goal, I believe, is reusable tree operations. But it's impossible to properly accomplish this goal without a standard ITreeNode<T> type, or whatever you want to name it, and its consistent usage throughout the framework. DependencyObject, for example, needs to implement the interface directly. This is something that only Microsoft can provide.

Apart from the ideal, what we're stuck with is code generation, which the second part of your articles tackles. But if we're going to generate code, why not generate the most efficient code possible? What I'm suggesting is... ditch the adapter pattern, ditch the wrappers. Even ditch the ILinqToTree interface -- it's part of the ideal which is not where we're at. As for the Parent/Children code you now have in the adapters, I suggest that it instead be input to the code generation process. That is, the resulting axis method implementations should directly include it instead of instantiating many wrappers.

I came up with an alternative approach that doesn't involve duplicating the tree operations' implementations. Nor does it instantiate wrappers for individual tree nodes. I blogged about this solution here[^].

That is a nice idea, using a Func delegate to determine the nodes related to the current context. I can see how the use of an appropriate 'selector' would allow you to navigate any axis.

However ... this gives great flexibility at the sacrifice of ease of use. What I like about my Linq-to-Tree API(s) is that they use an existing API pattern from Linq-to-XML (which is itself based on the existing concept of XPath axis) which will hopefully make it intuitive and easy to use.

Personally, I think the use of Func<...> in an API can be very confusing. I am a pretty experienced Linq user, however I still have to think quite hard when using methods like this ...

Just wondering if there would be a way to use this concept out to include:

a) support for multiple parents for a node? This could be for example in the modeling of web artifacts where one node (say a *.js file) is used by more than one parent (web page).

b) a means to persist (save / retrieve) the data from a database?

Or if you know a C# library that does provide support for managing an open/flexible graph of nodes with relationships between them, and save/retrieving them from a SQL database, then I'd love to hear about it.

a) This would an interesting challenge! Currently each node has a single parent, so the 'Parent' axis is very simple, just returning an order list of parents ending with the root node. If each node had multiple parents, the parent axis would be tree like as well! In this case I would add a corresponding 'parent' axis for each of the ones defined by XPath. For example, Descendants => DescendantParents, Elements => ElementParents, etc ...

b) I would not include this in the Linq-to-Tree API itself. However, you could probably create a persistence API that operates on the tree.

The persistence API I most often use is nHibernate, it might do what you are looking for.

I suppose I'm grappling with whether starting out with an in-memory collection type structure for what I'm and then add persistence. Or start with a persistance model (eg DataRow from ADO.net) and extending this to have the GetParents type methods.

Actually I thought overnight, one cool thing would be to write a provider for Linq to EnvDTE CodeItem tree and to refactor Daniel Vaughan's T4 metadata generator (http://danielvaughan.orpius.com/post/MetaGen-A-project-metadata-generator-for-Visual-Studio-using-T4.aspx)
In general I think programs structures in are tree like, and your linq to tree api would allow for really concise manipulation that can be use in a lot of T4 templates.

A few months ago (before I wrote this article) I was inspired by Daniels blog post you referenced above to use T4 template + EnvDTE to generate partial classes from class metadata in the form of attributes:

Within that blog post I included a very basic Linq to EnvDTE implementation.

You are right, I could probably make a much better queryable API using this Linq-to-Tree API. The only problem is, the EnvDTE interface is horrible! It is one of the worst tree-like APIs I have come across!

If I have a few hours spare, I might blog a Linq-to-EnvDTE implementation.

I see. That's pretty powerful.
You are right though, EnvDTE is the worse...
An implementation of Linq to ExpressionTree could also have its use.
Basically the GetChildren method would be a big switch statement that returns the proper children according to the Node type..

If one didn't want to go the T4 route, and also didn't want to create a new wrapper object for an element every time it is part of a query, there is a convenient caching solution. You could create an attached DP that stores an element's wrapper object. Set that attached DP on the element being wrapped to its wrapper, and next time you need to get a wrapper for that element just grab it from the element's DP.

I really would encourage everyone to use the T4 generated version, it has a much better API (see response below to Josh Fischer). I have posted the Linq to Visual Tree API to my blog, so that you can just cut'n'paste the ~400 lines of code:

If I understand things correctly, you are using T4 to generate 'concrete' versions of your tree adapter; as oppossed to using generics as you did in the first half of the article. What exact advantages does this have? I'm a little confused on what you mean by "In order to remove the need to wrap / unwrap the nodes of our tree..." What do you mean by "wrap"? I'm sure its something simple, but I'm not seeing it and I don't have time right now to re-read the article!

If I understand things correctly, you are using T4 to generate 'concrete' versions of your tree adapter;

Yes, that is correct.

Josh Fischer wrote:

I'm a little confused on what you mean by "In order to remove the need to wrap / unwrap the nodes of our tree..." What do you mean by "wrap"

With the non-T4 generated API, to traverse a tree you first have to wrap it in an adapter that implements the ILinqTree interface. The query then executes against the adapter type. This also means that the query returns instances of the adapter type, when you are probably more interested in the objects that it adapts, i.e. the items in the tree itself.

Another benefit is the removal of the type parameter from our extension methods, which means we can add methods with a type parameter used perform a check for nodes of a type. Further simplifying the query to this:

Thanks Daniel, glad you liked it. I was thinking of adding something similar, but if you try to implement it in practice, I think you will find that it does not reduce your effort. The T4 generation of Linq APIs relies on the adapter type being passed to the template function, so a concrete, non generic type is required.

Using generics was the first thing I thought of too after reading the article. In fact, at one point, I assumed it was generic'ized. It certainly 'feels like' your suggestion should work, but if Colin says no, then I believe him.

Hi Tony .. thanks for the feedback. A quick google reveals that quite a few people have come up with their own system. What I like about my solution is that it uses XPath axes, a well established approach to querying trees.