Introduction

Using the foreach loop is easy. It is so straightforward and natural that I rarely use any other loop construct. In fact, when it's time to use the for loop, I have to concentrate and recall the syntax.

I have to deal with a lot of tree traversals. The obvious choice is to have a foreach loop to go over all elements of a root, making a recursive call for each node that has children. This works out great until one notices that there are about twenty functions that have a foreach loop at their core together with a bunch of statements that operate on the same data type.

Using flags in order to exploit common pieces of code did not work out well. The functions simply became unreadable and much more error-prone. Something radical had to be done about this problem.

This article does not contain directly reusable code. Rather, it presents an idea on how to separate an algorithm that iterates over a custom collection from actions done on each item of that collection. The example code works with DirectoryInfo and FileInfo objects, but at no point does it modify files. The file system is used simply because it is a tree structure that is available on every computer. For simplicity reasons, there is no error handling and no security checks.

Separating algorithm from data

Staring at a bunch of functions that differ ever so slightly did have a good side effect after all. There are two things to note: first, the algorithm is always a foreach loop with a recursive call. Second, we are working with objects of the same data type, or a very limited number of data types.

The iteration algorithm does not care what happens to each object that it visits. Iteration's responsibility is simply to visit each and every item of a collection. Then who will do the useful work on visited items? A callback function, of course! We know the data type we are working with, that's the one in the foreach statement.

Let's use the file system as an example of a tree structure that we want to iterate over. For example, a function that displays all file system objects under a certain root directory could look as follows:

Let's modify it such that iteration code is completely separate from the output code. We are using file system as an example and will be dealing with DirectoryInfo and FileInfo instances. Declare callback types that take DirectoryInfo and FileInfo objects as parameters:

Improving design of iteration function

Though we are not displaying directory and file information inside the foreach loop anymore, there is no real difference between the new and the original DisplayAllElements functions. Calling a function callback was a step in the right direction. Let's improve it even more.

In my opinion, the iteration algorithm should be stateless. To maintain state it would need some knowledge about contents of nodes it traverses. This complicates the algorithm and makes it more difficult to verify its correctness. My approach is to keep it very simple. A static function will do.

IterateFileSystemInfo is almost identical to the modified DisplayAllElements. The only difference, except for the callbacks' names, is the fact that callbacks are provided as parameters and checked for null before being called. This allows certain flexibility. For example, we may not wish to display directory names. In this case we can pass a null parameter for DirectoryInfoHandler. All subdirectories and files will still be visited, though directory entries will not be "processed".

StaticIterators class could have several functions that traverse a collection in a certain way. The algorithm above is depth-first. One could include functions that do a breadth-first search.

Wrapping callback functions

The real work on collection members is done by the callbacks, which are provided to the iteration algorithm. We could simply have several instantiated delegates and call IterateFileSystemInfo using those delegates. However, design can be improved a lot if function callbacks are wrapped into their separate classes. It will let all object oriented design techniques to be applied to classes containing the callbacks. For example, Decorator pattern could be chosen for the processor class design. Passing parameters to callbacks can be done the same way as in "Passing parameters to predicates".

How can this new class be used with the IterateFileSystemInfo tree traversal algorithm? We need a new instance of the DisplayAllProcessor class. Delegate properties of that instance are used as input parameters for the tree traversal algorithm. As the tree traversal algorithm executes, it will fire MyDisplayDirInfo and MyDisplayFileInfo callbacks of DisplayAllProcessor instance. The way to use it is as follows:

Inheritance of wrapper classes

My testing program for this article is a WinForms application. It displays output of all callbacks in two columns of a ListView. One can immediately demonstrate benefits of wrapping callbacks in a class: storage of results that UI needs is built into a base class:

Instead of Debug.Writeline() calls, all processors will add display items to the list in the base class. When iteration runs to completion, property DisplayItems will contain all the data that UI needs. The new DisplayAllElements looks as follows, where ShowOutput simply fills the output ListView.

The infrastructure has been set up. Let's have a couple of examples how it can be used.

Maintaining state

State will be maintained by "processors", or instances of wrapper classes. Let's find the maximum and the average file sizes. To do so, we need to keep track of three things: the maximum file size seen so far, the combined size of all files seen, and the number of files seen. These will be member variables of our class.

Results are needed once iteration has run to completion. We know that DisplayItems property is normally accessed when iteration algorithm has finished its job. Therefore, statistics can be calculated in the DisplayItems property to take some calculations out of the iteration.

The FindAvgAndMaxProcessor class is used the same way as previous processor classes. The only difference is that instead of a list of visited files and directories it displays two rows of statistics.

Providing parameters for callbacks

Suppose, we want to find the maximum and average sizes of .EXE and .XML files for a given directory. A processor that takes file extension as parameter is in order. Since callback functions are wrapped in a class, it is simply a matter of having a public property. The new processor class is so tiny that I will include it here in its entirety.

We let the base class keep the count. Child class controls which files to process by comparing their extension to a member variable.

The FindAvgAndMaxByExtProcessor class allows selecting different sets of items to be processed. If a class allows that, the state data has to be taken care of when a different subset of items is selected. When setting a new file extension to match in the example above, we have to reset all counters to 0. Otherwise the example would produce combined statistics for both .EXE and .XML files, which is not what was intended.

FindAvgAndMaxByExt function demonstrates how the FindAvgAndMaxByExtProcessor class can be used.

Using the code

To run an example from the source code, select a directory, select the test case, and then click the Run Test button.

Conclusion

The collection iteration algorithm was decoupled from actions performed on each collection item. The algorithm function is a very simple static function, whose correctness is easy to verify. Once that was done, not a single looping instruction or recursive call was needed to execute test cases. All "processor" classes are also very simple. Some are literally a couple of lines long.

The iteration algorithm is stateless. It's simple and it does its job. A good reason to have this function in its own class would be to implement the IEnumerable interface, for example. I may do this just for the fun of it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.