LINQ to Tree - A Generic Technique for Querying Tree-like Structures

This article presents a generic approach to applying LINQ queries to tree like structures. Using T4 templates for code generation, LINQ to VisualTree (WPF), LINQ to WinForms, and LINQ to FileSystem APIs are constructed.

This article looks at how LINQ can be applied to tree-like data, using LINQ to XML as an example. A generic approach to querying trees is constructed, which allows any tree structure to be made LINQ 'compatible' with the implementation of a simple adapter class. This idea is extended further with T4 templates for code generation, with examples of LINQ to VisualTree (WPF / Silverlight), LINQ to WinForms, and LINQ to Filesystem presented.

One of my favorite features of the C# programming language is LINQ. As a programmer, so much of what we do is write code to filter, manipulate, and transform data. LINQ brings with it a whole host of functionality that allows you to perform these tasks using a much more expressive and succinct language. It has clearly played a big part in the development of the language itself, bringing with it Extension Methods, yield, and Lambda Expressions.

When I started learning WPF, I thought it might be useful to apply LINQ to the visual tree in order to query the state of the user interface (UI). For those of you not familiar with WPF, the visual tree is the hierarchical structure of elements, borders, grids, etc... that are used to build the UI. These are typically defined in XAML, i.e., XML, which is itself a hierarchical structure. The problem is, LINQ depends on the IEnumerable interface, which is used to define a 1-dimensional array of items. How can this be used to query a hierarchical structure like the WPF visual tree?

After a quick Google search, I stumbled upon Peter McGrattan's LINQ to Visual Tree implementation, which simply flattens the tree into a 1-dimensional structure. Elsewhere, I also found a more generic implementation by David Jade which can be used to flatten any tree in order to apply LINQ queries. However, in flattening the tree, you are losing useful information that might be relevant to your query. Neither were they the solution I was looking for!

Anyhow, I forgot about this problem for a while until more recently when I started a new project where we were working with a lot of data in XML, and naturally, we turned to LINQ to XML...

This section gives a very brief introduction to the LINQ to XML API and its relationship to XPath. If you are already a LINQ to XML guru, feel free to skip it and get onto the juicy bits!

The LINQ to XML API provides methods for querying XML documents which are loaded into memory in the form of an XDocument (or X-Dom for short). The in-memory XML document consists of a tree-like structure of nodes. At each node, there are LINQ methods available for searching the tree in order to find children, siblings, or nodes with some other relationship to the current node (context) that matches the various criteria.

A simple example, which queries the following example XML, is given below, based on the following XML instance document:

The following query will find all 'product' elements with a colour child element with a value 'Red':

The nodes matched by each part of the query are highlighted in the XML document above.

The first part of the query starts at the root of the document, invoking the XDocument.Descendants method to find any elements with the name order which are descendants (direct or otherwise) within the tree.

The second part of the query matches all the product descendants of order elements; however, this is where things start to get interesting. Whilst the first Descendants method in the above query is defined on XDocument, the second Descendants method that follows it is not. This method is defined on IEnumerable<XElement>, and is the result of invoking the Elements method on each of the XElements within the collection returned by the preceding Descendant query.

LINQ to XML defines a corresponding 'IEnumerable' extension method for each of the regular query methods on a single XElement (or XDocument). It is these methods that give LINQ to XML its real power, allowing the queries to 'spider' throughout the document.

The final part of the query filters the product elements. It finds the colour elements, which is a direct descendant, finding those with a value of Red.

The LINQ to XML API is powerful and expressive, but how were these query methods, that match sibling, ancestor, descendant elements selected? For this, the LINQ to XML developers borrowed from an existing technology for querying XML, that of XPath. The equivalent XPath query to the one given above in LINQ is:

//order//product[colour='Red']

XPath allows you to create a query by assembling a number of expressions and predicates, each one acting on the node-set which is the result of applying the preceding expressions. This is analogous to the LINQ to XML execution described above.

With XPath, at a specific node in the XML document, you have the option to query a number of different 'axes', each one defining a set of nodes relating to the current context node. This allows you to define queries that traverse the document in a number of different directions. The axes that are common to both LINQ and XPath are summarised below:

LINQ to XML

XPath

Illustration

Descendants

descendant (//)

Ancestors

ancestor

Elements

child (/)

ElementsBeforeSelf

preceding-sibling

ElementsAfterSelf

following-sibling

Note: In the above diagrams, the code highlighted in red is the current context. The nodes highlighted in green are the ones that are members of the given axis. Also, XPath has a corresponding 'or-self' axis for each of those given above, which includes the context node; these are also present in LINQ to XML as 'AndSelf' methods.

LINQ to XML borrows the axes defined by XPath for querying the XML tree. Why not use these same axes for querying other tree-like structures?

The problem is, there is no common interface for defining a tree structure; for example, the WPF visual tree and filesystem APIs have very different interfaces. A common solution to this problem is to apply the Gang of Four Adapter Pattern, where you define the interface that you really want, then 'adapt' the existing classes so that they implement this interface, typically by 'wrapping' them in another class.

In order to traverse a tree structure, at each node, all you need are methods to navigate to the parent or children. The following generic interface provides methods for traversing a tree of objects of type 'T':

///<summary>/// Defines an adapter that must be implemented in order to use the LinqToTree
/// extension methods
///</summary>///<typeparamname="T"></typeparam>publicinterface ILinqToTree<T>
{
///<summary>/// Obtains all the children of the Item.
///</summary>///<returns></returns> IEnumerable<ILinqToTree<T>> Children();
///<summary>/// The parent of the Item.
///</summary> ILinqToTree<T> Parent { get; }
///<summary>/// The item being adapted.
///</summary> T Item { get; }
}

This interface then allows us to create extension methods for navigating the different axes for a specific node of type 'T'.

(I have omitted the 'AndSelf' implementations; these are quite trivial; if you are interested, download the sample code for this article.)

However, as we saw in the LINQ to XML section above, these methods allow us to navigate from one node to the nodes within its related axes; however, it is the IEnumerable extension methods that provide the real power of LINQ to XML, allowing your query to 'spider' throughout the document.

The IEnumerable equivalents of the above axes methods are given below:

In the above code, you can clearly see the relationship between the methods that act on a single node and their IEnumerable equivalent which performs the same function on a collection of elements. In the above code, they are all implemented via the private DrillDown method, which applies a given function to each of the items within a collection, yielding the results.

An implementation of the ILinqToTree interface can be created to wrap any tree structure, allowing it to be queried using this API. To see how the ILinqToTree interface is used in practice, we will create an adapter that allows us to query the WPF visual tree:

The visual tree is composed of DependencyObject instances, and their relationships can be determined from the VisualTreeHelper. With the above adapter, we can now apply LINQ queries to our visual tree. The following is a simple example, where a Window with the following XAML markup is queried:

The following examples demonstrate a few queries executed against the visual tree. Each example is given in both LINQ query syntax and extension method (or fluent) syntax:

// get all the TextBox's which have a Grid as direct parent
var itemsFluent = new VisualTreeAdapter(this).Descendants()
.Where(i => i.Parent is Grid)
.Where(i => i.Item is TextBox)
.Select(i => i.Item);
var itemsQuery = from v innew VisualTreeAdapter(this).Descendants()
where v.Parent is Grid && v.Item is TextBox
select v.Item;
// get all the StackPanels that are within another StackPanel visual tree
var items2Fluent = new VisualTreeAdapter(this).Descendants()
.Where(i => i.Item is StackPanel)
.Descendants()
.Where(i => i.Item is StackPanel)
.Select(i => i.Item);
var items2Query = from i in
(from v innew VisualTreeAdapter(this).Descendants()
where v.Item is StackPanel
select v).Descendants()
where i.Item is StackPanel
select i.Item;

Each of these queries follows a similar pattern. First, the root of the visual tree is wrapped in the VisualTreeAdapter (which implements ILinqToTree); this is followed by the query itself; finally, a Select clause is used to unwrap each of the items that satisfies our query criteria.

In my opinion, queries against tree structures are most often more readable in fluent / extension method syntax, the sub-selects required in query syntax are needlessly verbose. However, there are a few times when query syntax, or a mixture, provides the most readable query, as we will see later.

Whilst this mechanism works quite well, there are a few not-so-nice features. Firstly, the addition of our adapter means that we are continually wrapping and un-wrapping the nodes of our tree. A second issue is a little more subtle. It would be nice to add extension methods for the common tasks of filtering an axis by type; for example, allowing us to find the descendants of type 'TextBox'. The following should fit the bill:

The above method does indeed produce the desired outcome, filtering the descendant nodes for those with a type given by K. However, while the existing method has a single type parameter that can be inferred by the compiler, the method above has two type parameters, and inference no longer takes place. This means that we not only have to specify the type we are searching for, but also the type being queried, as illustrated below:

In order to remove the need to wrap / unwrap the nodes of our tree, the extension methods need to be defined on the node type itself. The problem is, if we define extension methods on our specific node type, this means that the LINQ to Tree API has to be manually constructed for each node / tree type. The solution I found to this problem was to 'internalise' the use of the adapter, generating the LINQ to Tree extension methods via T4 templates.

T4 templates are built in to Visual Studio, and provide a mechanism for generating code files via C# and a simple template markup. For an introduction to authoring T4 templates, try this article on CodeProject. The solution works as follows: there is a single T4 template file that contains our simplified ILinqToTree interface (there is no need for the Item property required for unwrapping, or the factory method), and the LINQ to Tree extension methods, plus their IEnumerable equivalents. This code is within a T4 template method that takes parameters which detail the type which the API is being generated for (e.g., DependencyObject, DirectoryInfo) and the class which implements the IlinqToTree interface for this type.

Here is the template (Note: I have only included the descendants axis in this code snippet; the other axes and their 'AndSelf' counterparts are all in the code associated with this article):

The code within the template is very similar to that given in the previous sections. The main difference being that the use of the ILinqToTree adapter is now scoped within each extension method so that the LINQ to Tree API operates directly on the type itself. This removes the need for the generic type parameters on the 'regular' methods, allowing us to use a single type parameter for searching the collection for a specific type.

From these few lines of code, the T4 template generates a ~400 line API that allows you to query the tree making use of all the XPath axes described earlier. For example, the generated Descendants method looks like this:

I am not going to reproduce the whole LINQ to Visual tree API in this article, the code itself is really not that interesting! (You can find the full API in the code attached to this article.) What is more interesting is that fact that we now have a generic LINQ API for tree structures that can be tailored for generating the API for a specific tree type in just a few lines of code. Before we explore its application to some other tree-like structures, let's revisit the queries that we applied earlier to the visual tree. With the new API, the queries are much more succinct:

// get all the TextBox's which have a Grid as direct parent
var itemsFluent = this.Descendants<TextBox>()
.Where(i => i.Ancestors().FirstOrDefault() is Grid);
var itemsQuery = from v inthis.Descendants<TextBox>()
where v.Ancestors().FirstOrDefault() is Grid
select v;
// get all the StackPanels that are within another StackPanel visual tree
var items2Fluent = this.Descendants<StackPanel>()
.Descendants<StackPanel>();
var items2Query = from i in
(from v inthis.Descendants<StackPanel>()
select v).Descendants<StackPanel>()
select i;

Another commonly encountered tree structure is the hierarchical control structure within Windows Forms applications. This structure is not always that obvious to WinForms developers because it does not have the same significance that the visual tree structure has in WPF / Silverlight, where property inheritance is supported by the tree structure. However, you can see that the UI is indeed a tree, by switching on the Document Outline view:

Creating our LINQ to Windows Forms API is again simply a matter of creating our ILinqToTree adapter for a Control:

I have also added another extension method to DirectoryInfo to adapt the 'GetFiles' method which returns an array into something more LINQ friendly. For those of you familiar with XPath, files in this context can be seen as analogous to attributes in an XML document, and form another axis which can be queried.

Personally, I have a vast number of Visual Studio projects littered around my hard disk, the fruits of many ideas, some good and some bad, but all somehow lacking in organisation. I tend to keep the better ones in Subversion (SVN). The following query finds all the projects which I have added to SVN:

The first where clause in the query finds any directories that contain a '.svn' sub directory. The second where clause matches those which do not have a direct parent which itself contains a '.svn' directory (when you add a project to SVN, all folders will have a .svn folder added to them).

Here's another fun example that provides an interesting mix of query and fluent syntax:

The first part of the query, the 'in' clause, finds all bin directories and selects all of their descendant directories; the 'let' clause assigns the count of the number of XML files in each directory to a variable 'xmlCount'; the where clause selects those where the count is >0; finally, the selects clause creates an anonymous type containing the directory and the number of XML files. The output looks something like this:

This same query could, of course, be applied to any of the LINQ to Tree APIs generated in this article.

In conclusion, this article has demonstrated how to produce a LINQ API for any tree-like structure by the implementation of a simple adapter interface and the use of T4 templates for code generation. Hopefully, you have found this article interesting, and have perhaps learnt something new about the LINQ to XML APIs, or XPath axes.

If you manage to apply this technique to some other tree structure that I have not covered in this article, I would love to hear about it, please leave a comment below.

Enjoy!

History

3rd March 2010

Article updated, ILinqToTree interface modified based on feedback from William Kempf

I am also a Technical Evangelist at Scott Logic, a provider of bespoke financial software and consultancy for the retail and investment banking, stockbroking, asset management and hedge fund communities.