C/C++

Recursive Directory Search in C#

By Matthew Wilson, November 18, 2009

A flexible library for directory scanning that provides more capabilities than the standard .NET directory functions.

Specifying Search Criteria

The change from instance to class methods has probably the greatest impact on usage, so let's look at a pair of examples illustrating the old and the new. To search for all the font files in the Windows directory and its subdirectories with the old recls 1.x .NET mapping we'd write:

With such a simple example the differences are not huge, which is a good thing. Nonetheless, two of the most significant changes are illustrated:

Recursive search is now the default, so the RECLS_FLAG.RECURSIVE flag is not needed (or available)

Search sequences are now obtained via static methods on a static class

A search is conducted from a specified directory, in which all entries (files or directories) that match the given pattern(s) and correspond to the given search options, up to a given depth, are retrieved. A directory may be absolute or relative, but must exist. If null (or the empty string) is specified, the current directory is assumed. A pattern may be a file (or directory) name, or may use wildcards, as in "*.ttf". Furthermore, the library supports multi-part patterns, allowing discovery of entries matching different wildcards within the same string, as in "*.fon|*.ttf". A null argument for the patterns parameter is interpreted to mean the "everything" pattern for the given platform (i.e. "*" on UNIX, and "*.*" on Windows).

The search options can select Files, Directories, or both; absence of both is interpreted as Files). Other options allow for tailoring the search policy, as follows:

Searching of Hidden and/or System files, both of which are not normally listed

Ignoring access-denied conditions, which would otherwise cause the search to be terminated, by specifying IgnoreInaccessibleNodes; this option is ignored if an exception handler is specified. This is often necessary when specifying Hidden and/or System.

Marking of the Path, SearchRelativePath and File properties of entries that are directories (via MarkDirectories) with a trailing slash

Listing 4: Exception Handler Interface.

Returning PropagateException causes the exception to be rethrown, causing the search to be cancelled and the caller to receive the exception. Returning ConsumeExceptionAndContinue consumes the exception (perhaps after logging the condition) and continues the search, skipping the offending directory. Naturally, the purpose of this callback is not to allow users to attempt to suppress unrecoverable conditions, and the library does not invoke the callback in some such cases. Unfortunately, because the .NET exception hierarchy is such an abject mess, discriminating between logical errors, practically unrecoverable conditions, and recoverable runtime conditions is not a simple task, and it is likely that the set of exceptions made suppressible in this regard will change in future implementations. Users are expected to consume only specific expected exceptions; for instance, System.IO.DirectoryNotFoundException, rather than doing anything as unwise as consuming System.Exception.

Finally, processing a large directory tree with highly-specific pattern(s) can lead to a user experience with discernible pauses, due to filesystem latencies. Consequently, a progress callback mechanism is also provided, in the form of the IProgressHandler interface (see Listing 5), which allows callers to be notified as each new (sub-)directory is searched, perhaps to log the directory traversal changes to console, status bar, etc. It also affords the opportunity to apply search policy on a location basis, via return of a control code from the ExceptionHandlerResult enumeration: CancelDirectory causes the given directory and all its sub-directories to be excluded from the search; CancelSearch causes exclusion of all remaining directories, even those at a higher level in the tree.

Listing 5: Progress Handler Interface.

Given this richness in search specification -- directory, patterns, depth, options, exception-handler, progress-handler -- there is clearly a conflict between flexibility and discoverability [4, 5] in the possible overloads for the file search functions. Ignoring parameter ordering and ambiguities caused by some parameters (directory and patterns) sharing the same type, there are 64 possible combinations of the six parameters. If we add in parameter ordering, it becomes 121. To get a handle on the problem consider the case for just three parameters, directory (dir), patterns (ptns) and depth, with and without parameter ordering considerations. (In both lists, type ambiguities are marked with a <-X->.)

Without considering parameter ordering, we have eight combinations, of which six are viable:

You can imagine the complexity when permuting all six parameters! Clearly we need to make some judicious cuts. Since none of the parameters obviate the need for any of the others, the obvious must-have overload is one in which all six are present. The order is somewhat moot, but I'd suggest it should be either of the following:

Let's leave that decision for the moment while we consider the other options.

Another obvious decision is that we can throw out all permutations that don't have both directory and patterns parameters (and in that order). The utility of being able to specify only directory (to search for all files) or only patterns (to search in the current directory), rather simply specifying null in the stead of the omitted argument is vanishingly small. Not to mention the detraction from discoverability. So we can treat them as a mandatory unit. Furthermore, I think we can also stipulate that they'll always come first in the parameter list.

A further simplification that I felt was justified was that, as "advanced" options, we could treat the two handler arguments as a pair. The cost is a slight extra effort in specifying null for whichever is not needed, at the gain of reducing the overload set.

Given that it can make sense to specify depth independently of options, and them both independently of progress+exception, we can now cut the list down to eight (which ignores parameter ordering for the moment):

Another concern is that, so far, we've discussed the handlers in terms of the interfaces IExceptionHandler and IProgressHandler. But C# allows a different callback construct, the delegate, which is particularly useful with the advent of C# 2 and (even more so) with C# 3. recls 100% .NET defines two delegates, for handling exceptions and progress (see Listing 6).

Listing 6: Handler Delegates.

So, even if we thought eight was a survivable number of overloads (which is in doubt), providing for the delegate forms (which are highly convenient, as we'll see later on) would push this out to a minimum of 12. Unequivocally, this is too much choice, one of the enemies of discoverability.

Consequently, some hard decisions had to be made, and I made the necessary (and somewhat arbitrary) decisions to give the following overload set (where {D} designates a delegate form, as opposed to an interface form):

Although six may still feel like a lot, the fact that C# discriminates between int and enumeration types makes it pretty easy to live with it without ambiguity. We couldn't do the same in C++.

There's one final refinement to the overloading. Even though a search involving progress and/or exception handler may not usually require depth, I chose to keep the parameter ordering consistent (i.e. depth follows options) as this is an established principle of interface design [4, 6, 7]. It also fits in better with Visual Studio's Intellisense: when scrolling through the list of options the additional parameters appear naturally at the end of the list, rather than jumping around confusingly. Because of these reasons, I decided to remove the third overload -- (directory, patterns, depth) -- giving a final five. You may demur, but I think these five overloads represent an appropriate balance between flexibility and discoverability.

Listing 7: FileSearcher class interface.

The first thing to note is that there are 15 search functions, in three groups of five, representing depth-first, breadth-first, and mechanism-agnostic search; each returns an enumerable type implementing the IEnumerable<IEntry> interface. It would certainly have been possible to include BreadthFirst and DepthFirst flags in the SearchOptions enumeration, but it's unlikely that a choice between depth-first and breadth-first search is one you will need to make at runtime, and expressing design-time choices at runtime is best avoided because it detracts from discoverability.

Rather than define 15 methods (5 x Search(), 5 x BreadthFirstSearch(), 5 x DepthFirstSearch()) in the same class, they are segregated them into three groups of 5, with the algorithm-specific overloads associated with the nested (static) classes BreadthFirst and DepthFirst. Obviously this transgresses the accepted wisdom that class names be nouns, but in this case it's acceptable because it engenders transparency of client code in the form of human-readable statements, as in:

This is a technique that will be familiar to many .NET programmers. The provision of (static) Write()/WriteLine() methods of the Console class offers a syntactic convenience over calling them via its Out (static) property. Similarly, the notionally algorithm-agnostic FileSearcher.Search() methods simply call corresponding FileSearcher.DepthFirst.Search() methods. (This is consistent with recls 1.x, where depth-first was the only algorithm.)

We've now discussed the nuances of all parameters, with the exception of depth. There are two special values for depth: FileSearcher.UnrestrictedDepth and 0. The former places no restrictions on depth. The latter causes the search to be non-recursive, i.e. it searches only in the specified directory.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!