C/C++

Recursive Directory Search in C#

By Matthew Wilson, November 18, 2009

A flexible library for directory scanning that provides more capabilities than the standard .NET directory functions.

Several years ago I wrote the column Positive Integration for C/C++ Users Journal and later Dr. Dobb's Journal, which discussed issues involved in adapting C/C++ libraries to other languages. The main exemplar project used was recls ("recursive ls") [1], a platform-independent recursive filesystem search library written in C and C++, and with a C API. Adaptation to numerous languages (including Ch, C#/.NET (via P/Invoke), D, Java, Python, and Ruby) was examined, covering the development of the library from versions 1.0 through 1.6. Since that time, the library has continued to evolve, and now stands at 1.8. A new C/C++ version, 1.9, will be released in the coming weeks.

I have long planned to rework the library implementation. The two main changes will be a substantial refactoring of the source files and packaging for the core library and the C++ layer, and a rewrite of some/all of the language mappings in the form of full "100%" implementations. This article describes the first of these, a 100% C# implementation of recls for .NET. For clarity I'll refer to the original stream of work as recls 1.x and the new .NET library as recls 100% .NET in this article.

The reasons for these changes are:

The core library has grown to a level of complexity such that I no longer find it easy to make changes

I wanted to introduce diagnostic logging to the core library; this is included in recls 1.9

I wanted to ease the burden of deployment. For example, with the .NET mapping in versions up to 1.8, the recls.dll exporting the core C API (for access via P/Invoke) must be manually packaged along with the C# API in recls.NET.dll. Automated tools (such as Visual Studio) do not automatically copy it to working areas. And organisational security policies may prohibit use of assemblies that call into "unmanaged" code.

I wanted to take advantage of new features of languages over the last five years. As we'll see shortly, aspects of C# 3 make for improved syntax in client code for non-trivial search use cases

I wanted to implement two long asked-for features: breadth-first search, and search-depth limiting. recls 1.x provides only depth-first search, and always does a full-depth search.

Despite being written entirely in C#, the implementation of recls 100% .NET is larger than can be fully covered here. So I intend to focus on the interesting design points, language features, and the differences in functionality between recls 1.x and recls 100% .NET.

API Differences

The first difference is a cosmetic one. To placate FxCop [2], and also to clearly distinguish the new recls .NET API from the old for anyone who wishes to port their code to it, I changed the old recls namespace to Recls.

Similarly, the RECLS_FLAG enumeration is now SearchOptions (see Listing 1), and its enumerators are Files not FILES, Directories not DIRECTORIES, and so on. There are also fewer enumerators. Notably absent from the original [3] are RECURSIVE, LINKS, DEVICES, NO_FOLLOW_LINKS, DIRECTORY_PARTS, DETAILS_LATER, PASSIVE_FTP, and ALLOW_REPARSE_DIRS. The changes reflect the intended increase in portability and improvements to discoverability and transparency [4, 5] of the new API, based on user feedback.

Listing 1: The SearchOptions enumeration.

The FileEntry class is gone, replaced by the IEntry interface (see Listing 2). The FtpSearch class goes entirely, as the first version of recls 100% .NET does not support FTP search. The DirectoryParts class is no longer externally visible; the DirectoryParts getter-property now returns (an instance implementing) the interface IDirectoryParts; see Listing 3. The FileSearch class goes, and search is now provided by the (static) FileSearcher class.

Listing 2: The IEntry interface.

Listing 3: The IDirectoryParts interface.

IEntry vs. FileEntry

Table 1 compares the public interfaces of the old FileEntry class and recls 100% .NET's IEntry interface. The differences, highlighted in bold, involve changes to both syntax and semantics, and result from lessons learned by users of recls 1.x.

Table 1: Mappings Between Old and New Entry class/interface Methods and Properties.

Drive changed from a character to a string so that there'd be less hassle when manipulating UNC-based paths: Now users can deal with a single property, rather than a drive letter character in one, and a (UNC) drive string in another. The spellings of UNCDrive and IsUNC changed to follow .NET idiom. The Size property changed from ulong to long to be CLS compatible (for example, to be able to be used from VB.NET and other .NET languages that don't support unsigned integral types). IsLink and ShortFile had to go by the wayside because of the need to be implemented 100% in terms of the CLR facilities (and not go to P/Invoke). The Attributes property was added to allow recls to stay relevant in light of evolution in the CLR of the file attributes that may be made available to managed programmers.

There are also some semantic changes. The form of the file extension has changed, and now includes the dot, so "abc.net" will have an extension of ".net", rather than "net" as was the case with recls 1.x. Since this is a breaking change, I've removed the previous name, FileExt, and given it a new name FileExtension. (This also fits better with the .NET way of doing things, which is to avoid unnecessary contractions in names.)

It's useful to be able to paste the extension to another file name without having to pollute client code with logic to determine whether or not to insert the dot. Now, all of the following combinations will reproduce the full path (and, to be useful, may be used in combination with other strings to build correctly-formed new paths):

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!