STL & Generic Programming: Writing Your Own Iterators

To our delight and edification, Thomas continues his iteration on iterators. This time we get help defining specialized iterators for our own containers

August 2001/STL & Generic Programming

Good News

Come all ye STL programmers and gather round me, for I have delightful tidings for thee. Most of you will probably agree with me when I say that the minimal reading list for a beginning C++ programmer consists of one introductory text and Scott Meyers two books Effective C++ and More Effective C++. These books were written before the STL appeared on the radar of professional C++ programming. The good news is that Scott Meyers has written a third book called Effective STL. This new book, which should be in bookstores by the time you read this, offers the same kind of advice as his first two books, except that its focus is entirely on the STL. Since the STL is rapidly being accepted as an integral part of the C++ language, you really dont want to be caught with Scott Meyers third book missing from your bookshelf. As with his first two books, Effective STL is not an introductory text. You need to first learn about the STL from an introduction, such as [1], [2], [3], or [4]. Thatll tell you about the tools in the STL toolbox. Effective STL will tell you how to use these tools effectively and without causing damage.

Another book that needs to be mentioned in this context is Herb Sutters excellent Exceptional C++[5], which, among other things, contains advice on how to use the STL. The word is out that Herb Sutter has also written a follow-up, called More Exceptional C++. As of this writing, I have not had a chance to look at it, but there is no doubt in my mind that it will be another classic with much valuable advice regarding the STL.

Needless to say, reading Scott Meyers new book made me wonder if there is even still a need for a column such as this one. Why not just wrap up my pathetic little column and tell everybody to read two books instead, an introduction to the STL and Effective STL? I hope youll agree with me, especially after reading this column, that there are a few things left for me to say. If you will allow me a somewhat personal remark at this point, Ill say that I am fortunate enough to work with a team that decided very early on to bank on the STL as an integral part of C++. We now have a large code base (and a commercially very successful product, I might add) that is virtually free of pre-STL legacies. Moreover, using the STL has inspired us to use generic-programming techniques extensively. I do believe that in the process I have learned a number of things worth sharing with you.

Back to the Measurements Example

This is the third column in a row on the subject of iterators. In my April 2001 column, I talked about the iterators that the STL containers expose. In the June 2001 column, I explained some more advanced aspects of iterators, such as iterator categories and iterator traits. In that column, I also mentioned a real-life example that demonstrates what the creative use of iterators can do for you. The example was a piece of software for evaluating measurements taken on the pipes, tanks, valves, and the like of a chemical plant. Among other things, I mentioned a class that holds a sequence of measurements taken at one particular location at different points in time. Such a class will hold all kinds of information concerning the location of the measurements, and it will hold a collection of objects each of which represents one measurement. In its simplest form, this collection would be a vector of pairs, each of which holds a double and a date. For example, the class would contain the following private member declaration:

It is of course possible that there is more information associated with each individual measurement so that, instead of a pair of a double and a date, youd have to use a more complex object. For now, let us assume that we can get away with a pair or a similarly simple struct, which, just like std::pair, exposes its content through public data members. One of the main points in my last column was that sequentially structured data should always be exposed and passed around in the form of iterators. Therefore, in the present situation, our class should expose iterators to the collection of measurements as follows:

Rather obviously, this setup exposes all available information about the measurements to the client. For example, if there are two or more measurements present, the value of the second measurement can be retrieved as

(begin() + 1)->first

Still, from an STL point of view, this solution is very unsatisfactory. Suppose you wish to do something with just the sequence of measurement values, disregarding the dates. For example, you may wish to calculate some statistical information about the measurements, such as the standard deviation. The STL does not have a standard deviation algorithm, but if you are working on software that calculates and uses statistics, then your own toolbox of algorithms would most certainly contain an STL-style algorithm for the calculation of the standard deviation. That algorithm will take as its arguments a pair of iterators that represents the sequence of numbers whose standard deviation is to be calculated. Or you may want to display the values of the measurements in some tabular or graphical format that illustrates their statistical behavior. If your display module is written with the STL and generic programming principles in mind, then again, you will need a pair of iterators that represents the sequence of numbers to be displayed. In either case, youll want an iterator that, upon dereferencing, will return a const reference to the value of the measurement in question. But we dont have that. What we have are iterators that, upon dereferencing, return a reference to a pair whose first component is the value we need.

There are several possible solutions to this problem. One would be to change the data structure of our class and store the measurement values and their respective dates in parallel vectors. Certainly not a very appealing solution, especially in view of the fact that now you may have the opposite problem, where you want an iterator that dereferences to a pair that contains both the value and the date of the measurement. Another solution would be to require that all functions that accept a pair of iterators must alternatively accept a pair of iterators and a functional, with the understanding that the functional must make it possible to customize the behavior of the respective function to deal with the kind of problem were looking at. This solution is in keeping with the spirit of the STL: as a rule, STL algorithms allow this kind of customization. Suppose, for example, that you wanted to find the first occurrence of a certain value val in the collection of measurements. To do that, you could write

where FirstEquals is a functional that could for example be defined
as shown in Listing 1.

Although this solution goes quite a long way, it is not my favorite one. First, the requirement that functions accepting pairs of iterators must be customizable via a functional in a suitable way is not really a clearly stated universal requirement in the STL. You may find yourself working with algorithms and modules written by other people where this requirement is not satisfied. Second, the proliferation of functionals that results from this solution tends to clutter the code and make it harder to read and maintain. Third, writing the appropriate functionals often opens a whole ugly can of worms having to do with storing objects, by value or by reference, inside temporary function objects, something that should be avoided if possible. Fourth, and most importantly, what were doing here is making the client of the class deal with a problem that is nothing but an implementation detail. The class is intended, among other things, for use by clients who are interested just in the sequence of numbers, disregarding the dates. These clients can reasonably demand to be given access to this sequence of numbers through a pair of iterators that point to numbers and nothing else.

A Generic Iterator to Member

All this being said, we find ourselves confronted with the task of writing our own iterator class. It must be an iterator that runs over an std::vector of std::pairs, but upon dereferencing, returns a reference to a particular component of the pair that it currently points to, rather than to the pair itself. The first question we must answer is how general our iterator class should be. Such a decision is of course ultimately a matter of taste. It is, in a manner of speaking, a tradeoff between usability and reusability: a class that is too generic is often difficult for the client to understand and use, and a class that is not generic enough cannot be reused much.

In this particular case, it seems to me that a good level of genericity is to go for an iterator class that takes an existing iterator that dereferences to a struct or class with public data members and turns it into an iterator that dereferences to a particular public data member of the struct or class. In our example, it would turn an iterator that points to an std::pair<double, Date> into an iterator that points to the first component of the pair, a double.

Listing 2 shows such an iterator class, which I have
called iterator_to_member, with a few obvious function definitions omitted.
The code in Listing 2 cannot be used with Microsoft
Visual C++ and the version of the STL that comes with it, the root of the problem
being the fact that the compiler does not support partial template specialization.
This painful subject is discussed in more detail below. As indicated in the
comments, the template arguments to the class are _It, the type of the
original iterator; _T, the struct or class type that the original iterator
points to; and _R, the type of the member of _T that our new iterator
will point to. In the example that we started out with, these types will be
specialized as

The iterator_to_member class has two private data members: m_it, the original iterator that points to the entire struct or class of type _T, and m_memptr, a pointer to the member of _T that the new iterator will point to. Recall that in C++, a pointer to member is nothing but an offset that indicates the relative position of the respective member within the class layout. The one-liner operator* shows you how iterator_to_member uses that information:

return (*m_it).*m_memptr;

The expression in parentheses dereferences the contained iterator m_it. The result is a reference to the _T object that m_it currently points to. Using the pointer to member m_memptr, we obtain a reference to the desired member of the _T object. The operators operator-> and operator[] are implemented analogously. Notice that the underlying iterator type _It may not have an operator[]. In that case, the line

return m_it[_N].*m_memptr;

in the implementation of iterator_to_member::operator[] will not compile. But thats just fine, because member functions of class templates will not be compiled unless theyre called. The only downside to this is that you cannot force an explicit template instantiation on an iterator_to_member in the case where the underlying iterator type _It does not implement operator[]. That is because when an explicit template instantiation takes place, the compiler will attempt to compile all member functions.

All operators that have to do with the position of the iterator, such as operator++, are simply forwarded to the contained iterator. Again, there are some operators here that may not compile for certain values of the template parameter _It, but that is not a problem except when it comes to explicit instantiation.

Looking at the implementations of the various operators, you can see that our iterator_to_member is really just a thin wrapper around an iterator that dereferences to a struct or class with public data members. All operations having to do with the position of the iterator are simply forwarded to the wrapped iterator. All operations that have to do with dereferencing the iterator are first passed on to the wrapped iterator to obtain the object currently pointed to, and then the desired data member is extracted.

What remains to be discussed are the typedefs at the top of the class
definition of iterator_to_member. In my last column, I mentioned the
struct std::iterator_traits as a way to obtain the iterator category
of an iterator. The full definition of the STLs iterator_traits
is shown in Listing 3. As you can see, iterator_traits
is a struct that contains just typedefs for types that are associated
with the iterator type _It. As I mentioned in my last column, the definition
as shown in Listing 3 looks as if the whole iterator_traits
is totally gratuitous. When Im looking at some iterator type, such as

std::list<int>::const_iterator

and I want to know what that iterators value_type is, why would I write

std::iterator_traits<std::list<int>::const_iterator>::value_type;

and not simply

std::list<int>::const_iterator::value_type

when, as you can see from Listing 3, the former is
simply a typedef for the latter? The main reason for using the indirection
through the struct iterator_traits is that some iterators are basic types,
namely, plain old pointers, and these cannot provide their own associated types.
std::vector iterators, for example, are often implemented as plain old
pointers, and in these implementations, an expression such as

std::vector<int>::const_iterator::value_type

does not even make sense. For all iterators that do not or cannot provide their
own types in this way, we can provide a specialization or partial specialization
of iterator_traits to affix the associated types to the iterator.
For example, the STL deals with all iterators that are plain old pointers in
one fell swoop by providing the partial specialization of iterator_traits
that is shown in Listing 4. This is the reason that
the STL implementation that comes with Microsoft VC++ 6.0 cannot provide iterator
traits as stipulated by the Standard, because the compiler does not support
partial template specialization.

The keyword typename in Listing 3 is required
because the types that are used here are whats called dependent
types, that is, types that depend on template parameters. Whenever you
use a dependent type, you have to precede it with the keyword typename.
The reason for this is that it is the only way to enable the compiler to check
templatized code for syntactic correctness before instantiation. typename
tells the compiler the identifier that follows names a type, rather than a data
member or member function. Unfortunately, the current situation with different
compilers is confusing: some compilers correctly require the use of the typename
keyword, others dont allow it, and still others let you get away with
both.

In our iterator_to_member class, we want the iterator category and the difference type to be the same as that of the underlying iterator type, simply because everything that has to do with positioning the iterator is forwarded to the contained iterator. The value_type, i.e., the type of the object pointed to by the iterator_to_member, is the template argument _R, and pointers and references to that type are _R* and _R&, respectively.

Finally, a class with this many template arguments should always provide a wrapper around the constructor that is modeled after STL functions such as make_pair. As I mentioned in my last column, the point of these make functions is that, oftentimes, you find yourself constructing an object of a certain type and then immediately passing it to some function as an argument, without the need to explicitly assign it to a variable, as in

foo(std::pair<int, int>(42, 43));

In a case like that, you can use std::make_pair to save yourself the trouble of spelling out the template arguments, letting the compiler deduce them instead:

foo(std::make_pair(42, 43));

This is what the function make_iterator_to_member at the end of Listing
2 is all about.

Applying iterator_to_member

Suppose you have an std::vector of pairs of doubles, and you want to calculate the standard deviation of the first components of all pairs in the vector. Assuming that you have a standard deviation algorithm, heres how you can do this now:

This kind of use of the iterator to member is convenient in many situations, but the real beauty of the whole concept of writing iterators like this emerges when we go back to our example of the measurement class. What were going to do is expose an iterator to the first component of the pairs in the collection. Since this happens via a typedef, the client never even knows that theres something other than a plain old iterator in an STL container class involved. There is one issue that we need to address before we can write the appropriate typedef into the measurement class. Usually, when you write your own iterators, you have to write two versions, a const and a non-const one. In Matt Austerns article [6] on writing iterators, youll find a very elegant trick that saves you from having to write a lot of duplicate code. As it happens so often in generic programming, the trick uses partial template specialization, and therefore, it wont work with compilers that do not support partial template specialization.

With our iterator to member, were lucky enough to be able to come up
with const iterators by specifying the template arguments appropriately.
If we use a const iterator type for the first template argument (the
type of the contained iterator) and a const type for the third template
argument (the type of the member that we want to point to), then the resulting
iterator to member is precisely the const iterator that we want. Listing
5 shows all the typedefs and member functions that our measurements
class must have so that it exposes to the client both the sequence of pairs

that it really contains and the virtual sequence consisting of the
first components of these pairs.

Microsoft Visual C++ and iterator_traits

As we have seen, the whole idea of iterator traits, like many other nifty techniques
in generic programming, cannot really work under Microsofts Visual C++.
The main problem is that the STL defines a partial specialization of struct
iterator_traits that covers all plain old pointers, as shown in Listing
4. If the STL implementation that comes with Visual C++ would just omit
that partial specialization, but provide the general struct iterator_traits
shown in Listing 3, then the whole thing would be a
mere inconvenience: for all iterators in your program that happen to be plain
old pointers (in particular, all vector iterators), you could write a specialization
of iterator_traits, and all would be well. Unfortunately, however, the
struct iterator_traits in the STL that comes with Visual C++ is different
from the one that the Standard requires: it is missing the pointer and reference
types, and the difference_type is called distance_type. Therefore,
this struct is of very limited use, to say the least.

As far as the iterator category is concerned, Microsofts STL also offers a workaround that stems from one of the earliest versions of the original STL. Recall that the typical use of the iterator category is to make algorithms behave differently for different kinds of iterators. Many algorithms look like this:

Here, alg_internal is a function that is overloaded for different types of the third argument. As you can see, all thats needed here is a default-constructed temporary object of the iterators category type. Microsofts STL has a function _Iter_cat, which, when applied to an iterator, returns just such an object. Therefore, algorithms in their STL implementation often look like this:

This is how you should write algorithms that need to distinguish between iterators of different categories if youre working with Visual C++. The problem with the _Iter_cat gimmick is that it does not lend itself easily to be extended to custom iterators such as our iterator_to_member.

Microsofts website makes no statement about if and when they intend to support partial template specialization. I have heard it through the grapevine that partial template specialization is not considered a priority by Microsoft and will not be supported in the next version of Visual C++.

I have also heard that in the upcoming version of Visual C++, Microsoft plans to improve the situation slightly by implementing all STL iterators as proper classes. So be sure to never make use of the fact that std::vector iterators are currently typedefed as plain pointers, and hope that nobody on your team has done so in the past.

Conclusion

A few well-crafted generic iterator classes can greatly enhance the benefit that you reap from using the STL. For more on this subject, you may want to check out [7] and [8].

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!