The Filterator

My name is Ahmed Charles and I currently work on Windows Error Reporting. I believe that this is the first time that someone not on the VC team has written a blog, but I hope you will find it useful anyways. I’d like to thank Stephan T. Lavavej for the idea and valuable feedback while writing it. And we both owe the idea for the iterator to Boost.

A common question from programmers who have an intermediate amount of experience with using the STL is, “How do I write an STL iterator?”. Writing an STL iterator is not especially difficult – it just requires some care. STL iterators must satisfy a number of requirements (given by section 24.2 of the (soon to be) International Standard for C++, draft at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf), and the code to do so takes roughly 150 editor lines. Figuring out the code from the requirements can be overwhelming, but once you see the code, it’s easy.

Therefore, I’ve written an example STL iterator, whose purpose in life is to wrap an existing iterator and filter the elements based on a supplied predicate, which I’ve imaginatively called Filterator (in homage to the Mallocator). I’ve carefully implemented all of the checks that would be required in real production code. And I’ve exhaustively commented the various parts of the iterator to aid in determining which portions should be changed when implementing other iterators. Hopefully, this should demystify the implementation of STL iterators:

First bug report! The default constructor will store uninitialized values if the 'base' iterator is a pointer. You should explicitly value-initialize each member in the constructor's initializer list to avoid undefined behaviour. Otherwise, this is a pretty neat article, showing off not just iterators but also how some of the other 0x language and library facilities come together – nice!

Max, Mark R. – I rewrapped Ahmed's code to 60 columns. It should be readable now. I also restored the example's output, which I had mistakenly dropped on the floor during my initial attempt at rewrapping.

@Ben L: The STL doesn't currently have this functionality. I believe that this iterator meets all of the standard's (C++0x) requirements other than complexity. That one is a bit more ambiguous since the standard doesn't specify which sequence the constant amortized complexity refers to. If it's the underlying sequence, this iterator is fine, but if it's the sequence represented by reduced sequence, then we don't meet the requirement.

@AlisdairM: Thanks for the bug. I was debating with STL (Stephan T. Lavavej) about whether having value-initialization would matter and we figured it was best to make the extra guarantee. And one of my goals was to showoff the new C++0x features.

@mikeb Good point, that would not be ideal behavior for many situations. Since it's possible that no items might be valid, I would rather the constructor just hold a flag indicating whether it has ever been validated against the predicate. This might add overhead the other methods. Any takers on how to keep this lazy until usage?

Why would you construct a filterator if you aren't going to use it? And if you are going to use it, you're going to have to start with the first good element, and that requires sliding the underlying iterator forward until you find it. If all elements are bad, it's going to take you linear time to discover that fact.

Note that after the filterator's constructor establishes its invariant, copying that filterator around won't invoke the predicate.

I see no advantage to implementing laziness here, and two disadvantages (decreased performance and increased code complexity). The only possible advantage of laziness that I can imagine is if the elements of the sequence are changing between filterator construction and initial use. (Not in concurrent terms, which would be horrible crashtrocity, but in sequential terms.) In that case, an eager implementation and a lazy implementation could observe different elements as being the first good elements.

As a library developer, my professional term for that is "squirrely". I would have no problem explaining to users that with an eager implementation, construction and incrementing/decrementing are what scan for good elements. (With a lazy implementation, initial dereferencing and incrementing/decrementing would scan for good elements, and that's actually more confusing.)

Think of initially constructing a filter-iterator pair. How do I know if the range is empty? Normally I would compare the two iterators, but if the first iterator has not yet tried filtering, this will (typically) return false. OK, now that I know the range is (apparently) not empty, can I safely dereference the first iterator? In order for this iterator to work from a client perspective, it must either seek the first non-filtered value on construction, or support an initial 'not yet filtered' state and check that as a precondition for many members. I think the find-first on construction in the simpler, safer behaviour.

@Marcus Lindblom: It is much easier to write with Boost.Iterators. But not everyone can use the Boost C++ Libraries for various reasons, including legal ones. It is also a matter of learning. If you want to understand how to write an iterator using Boost.Iterators, their documentation is plenty sufficient and this blog post wouldn't be useful at all.

I liked this blog entry. I think that the examples were very helpful. I just want to point out that CEND is not portable for all compilers. Some compilers (such as gcc) will not compiler this code. However, great post in any case.

> I just want to point out that CEND is not portable for all compilers.

> Some compilers (such as gcc) will not compiler this code.

1. This is a Microsoft blog, so you shouldn't be terribly surprised when it features MS-specific code.

2. This code is not MS-specific. With VC10 RTM and GCC 4.5.0:

C:Temp>type meow.cpp

#include <algorithm>

#include <iostream>

#include <numeric>

#include <ostream>

#include <vector>

using namespace std;

int main() {

vector<int> v(7);

iota(v.begin(), v.end(), 1);

for_each(v.begin(), v.end(), [](int& n) { n *= 11; });

for (auto i = v.cbegin(); i != v.cend(); ++i) {

cout << *i << " ";

}

cout << endl;

}

C:Temp>cl /EHsc /nologo /W4 meow.cpp /Femeow_vc.exe

meow.cpp

C:Temp>meow_vc

11 22 33 44 55 66 77

C:Temp>g++ -Wall -Wextra -std=c++0x meow.cpp -o meow_gcc.exe

C:Temp>meow_gcc

11 22 33 44 55 66 77

The difference is that GCC compiles in C++98/03 mode by default, and -std=c++0x must be used to request C++0x mode. In contrast, VC10 RTM (quite sensibly, if you ask me) always compiles in C++0x mode, and in fact there is no mode, because it doesn't permit C++0x's features to be disabled (except for /Zc:auto- which absolutely nobody should use – the only reason we added it was because we had exhaustive C++98/03 conformance tests that used old auto, and adding the option was easier than updating the old tests).

(Ahmed's code won't compile in GCC 4.5.0 because it uses nullptr and std::addressof(). Otherwise, it does. I've corrected the code by adding "template <class OtherIterator, class OtherPredicate>" to "friend class filterator;" and I've added a use of the converting constructor to main().)

Whether or not one can use Boost, the point of an article like this is to learn about the techniques. While it might have been nice to have the article point out that Boost has facilities that make writing your own iterators easier, that's been done now (in the comments).

As far as being paranoid about licenses… I've heard of places that are quite paranoid about taking on 3rd party code regardless of license. They apparently believe that unless they've paid money for the 3rd party code they're opening themselves up to a world of liability or something. Then again, since I haven't lived this, it could be stories with no real truth behind it.

Every place I've worked has had a preference that open source code not be used if possible, but was always willing to carefully accept the use of open source code under certain licenses.

Yes, CTO can be paranoid. I used to work for one. It was not his fault; the stockholders had hired specific lawyers.

Regarding the legal status of Boost: everything is possible after the Eolas "patent" was confirmed as valid and enforceable in the USA. And it will only get worse. If anyone is paranoid, it is USPTO, DMCA and whatever American lawmakers are going to make the next law. These people worship $$ and despise reason and common sense.

Ahmed:

Thanks for the post, very interesting and instructive. I have been having a C++-based query language at the back of my head (as opposed to Linq which requires CLR and compiler support), and I view it as a step in this direction. I am, however, a little surprised that you did not use std:: iterator as a base class, that is the standard way to do these things.

And, hoping that repeating things makes them true: it is not appropriate to require that iterators used in algorithms should be default constructible, and it is possible to implement the standard algorithms so that it does not rely on this requirement. The resulting code will be better, believe me.