Tools

Recursive Directory Search in C#

By Matthew Wilson, November 18, 2009

A flexible library for directory scanning that provides more capabilities than the standard .NET directory functions.

Special Search Functions

As UNIX programmers will know, the stat() system call provides status information about a given path, in the form of the struct stat type. The recls core C API provides the function Recls_Stat(), which provides status information about a given path, in the form of the recls_info_t type (a multi-attribute type analogous to IEntryj). Several recls mappings provide a stat()/Stat() method that returns a file entry object, or null/nil if no such entry exists. I have found this a handy tool over the years, particularly when working in Python and Ruby, and I wanted to continue to offer it for .NET users, as FileSearcher.Stat(). This method either returns null if the file does not exist, or an instance implementing IEntry representing the filesystem entry if it can be accessed, or throws an exception if it cannot. (In other words, System.IO.FileNotFoundException and System.IO.DirectoryNotFoundException are caught, and null returned.)

The other function set, FileSearcher.CalculateDirectorySize(), does exactly what it says on the tin: it calculates the size of a directory, as the sum of the sizes of all files in that directory or in any of its sub-directories (up to a given depth). Since this is an expensive operation, I chose not to have directory size automatically calculated during a code>Search()-based enumeration. But it's a useful thing to have available, as in the following example, which displays the sizes of all immediate subdirectories of the current directory:

Listing 9: PathUtil class interface.

Each of these represents some functionality essential to the proper workings of Recls's searching that is not available in, or corrects defective alternatives in, the CLR's path manipulation facilities:

DeriveRelativePath(), CanonicalizePath(), and GetDrive() do not have CLR equivalents

GetAbsolutePath() corrects drive-only UNC paths, i.e. "\\server\share" to append a slash, in the same way that System.IO.Path.GetFullPath() does for drive-only volume paths, such as "C:"

PathUtil.GetDirectoryPath() yields the directory path -- a recls notion of encapsulating drive (for operating systems that have the concept of a drive) + directory -- and corrects the (in my opinion) defective behaviour of System.IO.Path.GetDirectoryName(), which returns the empty string when given a root path such as "C:\" or "\\server\share\"

PathUtil.GetFile() yields the file component - file name + extension - of a path and works correctly with UNC paths such as "\\server\share" (for which System.IO.Path.GetFileName() returns "share"!)

Extension Methods

With C# 3 comes the ability to enhance the (apparent) operations available on existing types by the use of Extension Methods [8, 9]. I've taken advantage of this for recls 100% .NET by adding the ForEach, Select, and Where methods, as shown in Listing 10. We'll see an example of how these are used (with LINQ [8, 9]) shortly.

Listing 10: Search Extensions.

In C++ terms, this is akin to a partial template specialization, because the extension methods are defined only for IEnumerable<IEntry>.

Predicates or Functions?

There was one interesting twist here, with implementing Where. Since it requires a predicate -- a decision function that returns a Boolean value -- I defined it in terms of System.Predicate, which is a delegate defined as follows:

Listing 11: Use of Extension Methods with Predicate(s).

However, if we add in a "using System.Linq;" statement to the WhereDemo namespace, we get a compile error (with some namespace qualifications removed for clarity):

error CS0121: The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<Recls.IEntry>(IEnumerable<IEntry>, System.Func<IEntry,bool>)' and 'Recls.SearchExtensions.Where(IEnumerable<IEntry>, System.Predicate<IEntry>)'

What appears to be happening here is that the compiler resolves the lambda expression (e) => e.IsReadOnly) (or the equivalent anonymous delegate expression, also shown) to System.Func<IEntry, bool>, rather than System.Predicate<IEntry>.

namespace System
{
public delegate TResult Func<T, TResult>(T arg);
}

Consequently, the two possible Where (extension) functions each have one precisely matching argument and one possibly matching argument, hence the ambiguity. This is why I had to implement the recls Where extension in terms of System.Func<IEntry, bool>, giving two precisely matching arguments, and removing the ambiguity. Obviously, if the C# team ever decide to change the compiler to interpret one-parameter Boolean-returning anonymous delegates / lambda expressions as System.Predicate<>, any such "partial specialisations" will be broken, so I'm guessing that'll never happen, and we just need to get used to using System.Func<T, bool>, even though a predicate makes more sense.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!