It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
.being property returns an iterator that starts at the beginning
.end returns an iterator that is at the end
Overloading * doesn't work with D. But, instead:
Overload opIndex for rvalue access
Overload opIndexAssign for lvalue access
Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

Make that .begin
Also, overloading += will be used to advance the iterator.

that would give something like this?
import std.stdio;
void main()
{
char[] s = "hello";
// += for advancing an iterator, why not ++?
for (iterator i = s.begin; i != s.end; i += 1)
writefln(s[i]);
// ??? i is an iterator or a char ???
foreach(i; s)
// extend writefln to take iterators ?
// but that would mean that it must be possible
// to determine a value from an iterator
writefln(i);
}

I didn't read the discussion, but wasn't there a topic about
foreach_reverse? Should that not cover s.revIter()?
Secondly, why must one give in the foreach statement the first iterator?
I'd say that there the iterable must be given?
roel

I didn't read the discussion, but wasn't there a topic about
foreach_reverse? Should that not cover s.revIter()?

How? Say I do this:
foreach_reverse( e; s.begin() )
I've passed an iterator pointing to the beginning of a sequence to an
algorithm that wants to iterate backwards, so best case the iterator is
bidirectional and we will visit exactly one element (the theoretical
"last" in this case) before exiting the loop.

Secondly, why must one give in the foreach statement the first iterator?
I'd say that there the iterable must be given?

Something like that, yes. You need a unified object that knows when
it's reached the end of the sequence. So assuming C++ style iterators
you'd have to wrap them in an object that takes care of the comparisons
and looping to work with foreach.
Sean

Make that .begin
Also, overloading += will be used to advance the iterator.

that would give something like this?
import std.stdio;
void main()
{
char[] s = "hello";
// += for advancing an iterator, why not ++?
for (iterator i = s.begin; i != s.end; i += 1)
writefln(s[i]);
// ??? i is an iterator or a char ???
foreach(i; s)
// extend writefln to take iterators ?
// but that would mean that it must be possible
// to determine a value from an iterator
writefln(i);
}

Make that .begin
Also, overloading += will be used to advance the iterator.

that would give something like this?
import std.stdio;
void main()
{
char[] s = "hello";
// += for advancing an iterator, why not ++?
for (iterator i = s.begin; i != s.end; i += 1)
writefln(s[i]);
// ??? i is an iterator or a char ???
foreach(i; s)
// extend writefln to take iterators ?
// but that would mean that it must be possible
// to determine a value from an iterator
writefln(i);
}

Why would I use
> for (iterator i = s.begin; i != s.end; i += 1)
> writefln(s[i]);
instead of:
> for (int i = 0; i < s.length; i++)
> writefln(s[i]);
?

The point is that you could replace char[] with SomeCollection and the
code should still work. But the same mechanism should also work on the
built-in types.
--bb

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
.being property returns an iterator that starts at the beginning
.end returns an iterator that is at the end
Overloading * doesn't work with D. But, instead:
Overload opIndex for rvalue access
Overload opIndexAssign for lvalue access
Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

Would this mechanism work for, say, arrays of structs (not struct*[] 's) so
that we could avoid the
overhead of either a) copying the struct and/or b) the extra level of
indirection in the loop? (In
other words, could the iterator provide a reference to each element by default).
Thanks,
- Dave

It's becoming increasingly obvious that D needs iterators. While
opApply is far better for some kinds of iteration (such as
recursively traversing a directory), iterators are more efficient in
some cases, and allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with
const iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
.being property returns an iterator that starts at the beginning
.end returns an iterator that is at the end
Overloading * doesn't work with D. But, instead:
Overload opIndex for rvalue access
Overload opIndexAssign for lvalue access
Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

Would this mechanism work for, say, arrays of structs (not struct*[] 's)
so that we could avoid the overhead of either a) copying the struct
and/or b) the extra level of indirection in the loop? (In other words,
could the iterator provide a reference to each element by default).
Thanks,
- Dave

I should add - will native arrays of structs (say) provide this by default w/o
having to add
specialized operators?
Thanks.

It could just be the count. Say I'm trying to linearize a non-linear
datastructure into a pre-existing array:
foreach(int i, val v; hierarchical_thing.iterator())
{
linearized[i] = val;
}
Also having the count there would make it easier to do something like
iterating lock step in funky ways over different objects:
foreach(int i, val v; thing.randomized_iterator()) {
other_thing[i] = val;
}
Obviously you could count things yourself if you had to.
But I find for instance in Python that I use 'enumerate()' quite often.
enumerate() takes an iterator and returns an iterator that spits out
(count,value) tuples.
--bb

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
..being property returns an iterator that starts at the beginning
..end returns an iterator that is at the end

Is it worth discussing whether the pointer-style C++ iterators are the
best approach? They certainly are if random access iterators are a
requirement, but if we're just implementing forward and reverse
iterators, the all-in-one Java approach may be better. In that vein, I
assume alose that reverse iteration would work with reverse iterators
and an .rbegin type method instead of .begin and foreach_reverse?
Sean

Is it worth discussing whether the pointer-style C++ iterators are the
best approach? They certainly are if random access iterators are a
requirement, but if we're just implementing forward and reverse
iterators, the all-in-one Java approach may be better. In that vein, I
assume alose that reverse iteration would work with reverse iterators
and an .rbegin type method instead of .begin and foreach_reverse?

Is it worth discussing whether the pointer-style C++ iterators are the
best approach? They certainly are if random access iterators are a
requirement, but if we're just implementing forward and reverse
iterators, the all-in-one Java approach may be better. In that vein,
I assume alose that reverse iteration would work with reverse
iterators and an .rbegin type method instead of .begin and
foreach_reverse?

Hmm, or it could be done starting with .end and going to .begin.

Yup. About the only potentially weird thing there is that, assuming C++
style iterators, it would require reverse iterators and forward
iterators to be comparable to one another. And, I suppose, for .end to
return a reverse style iterator even for sequences that don't support
reverse iteration?
Sean

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)?
int *some_numbers;
Is it required that a pointer be a valid iterator?
I.e. should it be possible to have this template function:
print_all(some_numbers, some_numbers+theLength);
Or will that case require a wrapper object that contains theLength?
auto iter = PointerIterator(some_numbers, theLength);
print_all(iter);

2) Doesn't need to work with associative arrays

Why not? Do you mean it doesn't need to because you can just iterate
over keys or values?

3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable

So here's one design:
.being property returns an iterator that starts at the beginning
.end returns an iterator that is at the end

That's like C++'s way. Iterator is basically a generalized pointer.
The other proposal is more like Java/Python/C#'s way, where the iterator
is like a pointer that knows it's own limits.
The Java/Python/C# way would definitely work in D. I'm not sure about
the C++ way working for D because, in practice, references and
dereferencing get used pretty heavily there. On the other hand
Java/Python/C# were designed without real templates.
One of the major design goals for C++ was that for a basic array a
regular pointer should be a valid iterator. Thus the iterator cannot
know where it's own end() is, because a pointer does not have such
knowledge. But aside from cute demos on SGI's STL web page where
algorithms are applied to plain arrays using pointers, I've never seen
that sort of thing used in the real world. You almost always use a
std::vector or std::list or something besides a raw array. So pointer
equivalence seems to be pretty useless to me. Even moreso with D. In
the rare case you need to work with raw pointers, you can always make an
iterator wrapper.
On the plus side for C++ style:
* iterators are lightweight (just one size_t/pointer in most cases)
* algorithms and methods all take begin/end pairs making things uniform.
Minuses
* Iterators are dumb and error prone and easily go out of bounds
* algorithms all require begin/end pairs making things cumbersome
I think the minuses outweigh the plusses.
But maybe it's useful to take one concrete example: A generic mergesort
or quicksort algorithm.
In C++ recursive calls just pass the limits of the range to the children
mergesort(begin, end)
{
...
mergesort(begin, (end-begin)/2);
mergesort((end-begin)/2, end);
}
It's very efficient.
In the iterator-is-object approach I'm not sure how that works.
If all you've got is .next() or .hasNext() I'm not sure what you're
supposed to do to create an iterator over half the range represented by
the original iterator.

It's becoming increasingly obvious that D needs iterators. While
opApply is far better for some kinds of iteration (such as
recursively traversing a directory), iterators are more efficient in
some cases, and allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)?
int *some_numbers;
Is it required that a pointer be a valid iterator?
I.e. should it be possible to have this template function:
print_all(some_numbers, some_numbers+theLength);
Or will that case require a wrapper object that contains theLength?
auto iter = PointerIterator(some_numbers, theLength);
print_all(iter);

It's simpler than that. Pointers can be converted to arrays:
auto array = some_numbers[0 .. theLength];
and then you can get the iterator.

2) Doesn't need to work with associative arrays

over keys or values?

Because that falls into my big complaint with iterators - there's no way
to efficiently 'linearize' a binary tree. Might as well just iterate
over the keys or values.

That's like C++'s way. Iterator is basically a generalized pointer.

Yes.

The other proposal is more like Java/Python/C#'s way, where the iterator
is like a pointer that knows it's own limits.
The Java/Python/C# way would definitely work in D. I'm not sure about
the C++ way working for D because, in practice, references and
dereferencing get used pretty heavily there. On the other hand
Java/Python/C# were designed without real templates.

I think the C++ like way will be a lot more efficient, and I think it
will work. Reference types are not needed.

One of the major design goals for C++ was that for a basic array a
regular pointer should be a valid iterator. Thus the iterator cannot
know where it's own end() is, because a pointer does not have such
knowledge. But aside from cute demos on SGI's STL web page where
algorithms are applied to plain arrays using pointers, I've never seen
that sort of thing used in the real world. You almost always use a
std::vector or std::list or something besides a raw array. So pointer
equivalence seems to be pretty useless to me. Even moreso with D. In
the rare case you need to work with raw pointers, you can always make an
iterator wrapper.

I think it'll work fine, since pointers can be easily converted to
dynamic arrays (see above).

On the plus side for C++ style:
* iterators are lightweight (just one size_t/pointer in most cases)
* algorithms and methods all take begin/end pairs making things uniform.
Minuses
* Iterators are dumb and error prone and easily go out of bounds
* algorithms all require begin/end pairs making things cumbersome
I think the minuses outweigh the plusses.
But maybe it's useful to take one concrete example: A generic mergesort
or quicksort algorithm.
In C++ recursive calls just pass the limits of the range to the children
mergesort(begin, end)
{
...
mergesort(begin, (end-begin)/2);
mergesort((end-begin)/2, end);
}
It's very efficient.
In the iterator-is-object approach I'm not sure how that works.
If all you've got is .next() or .hasNext() I'm not sure what you're
supposed to do to create an iterator over half the range represented by
the original iterator.

Iterators can be any type, not just objects, that support [], ++, +=, and =.

It's becoming increasingly obvious that D needs iterators. While
opApply is far better for some kinds of iteration (such as
recursively traversing a directory), iterators are more efficient in
some cases, and allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)?
int *some_numbers;
Is it required that a pointer be a valid iterator?
I.e. should it be possible to have this template function:
print_all(some_numbers, some_numbers+theLength);
Or will that case require a wrapper object that contains theLength?
auto iter = PointerIterator(some_numbers, theLength);
print_all(iter);

It's simpler than that. Pointers can be converted to arrays:
auto array = some_numbers[0 .. theLength];
and then you can get the iterator.

Nice! Didn't know you could do that.

I think the C++ like way will be a lot more efficient, and I think it
will work. Reference types are not needed.

The C++ way really embodies two design choices:
1) iterators are defined by 'concept'
--> Means generic iteration functions all need to be templates
--> The other alternative is define them by interfaces.
--> these two aren't necessarily mutually exclusive though
2) an iterator is a pointer -- begin() and end() are kept separate.
--> The other alternative is 'an iterator is a range'
Definitely 1) will be more efficient than using interfaces,
but I don't thing pointer semantics, 2), are required for 1) to work.
It may be possible for D iterators to have the speed of C++ while having
the convenience and safety of Java's iterators that know their bounds.

I think the minuses outweigh the plusses.
But maybe it's useful to take one concrete example: A generic
mergesort or quicksort algorithm.
In C++ recursive calls just pass the limits of the range to the children
mergesort(begin, end)
{
...
mergesort(begin, (end-begin)/2);
mergesort((end-begin)/2, end);
}
It's very efficient.
In the iterator-is-object approach I'm not sure how that works.
If all you've got is .next() or .hasNext() I'm not sure what you're
supposed to do to create an iterator over half the range represented
by the original iterator.

Iterators can be any type, not just objects, that support [], ++, +=,
and =.

I think the iterator-as-range idea is at least worth discussing. 99% of
the time you use an iterator you also need to know where to stop. It's
pretty similar to how arrays in D keep a length, because 99% of the time
when you have an array you need to know how long it is too. So it makes
sense to keep those two bits of information together. Similarly in C++,
when you have an iterator, you almost always need the end() too. And
any time you want to pass around iterators you end up having to pass the
two bits of information around separately.
--bb

I think the iterator-as-range idea is at least worth discussing. 99% of
the time you use an iterator you also need to know where to stop. It's
pretty similar to how arrays in D keep a length, because 99% of the time
when you have an array you need to know how long it is too. So it makes
sense to keep those two bits of information together. Similarly in C++,
when you have an iterator, you almost always need the end() too. And
any time you want to pass around iterators you end up having to pass the
two bits of information around separately.

When writing custom C++ iterators, I find that end() is not ever necessary.
If end() is not used, it means that a little more smarts have to be added to
the iterator itself so that the iterator knows when to stop. In some cases
this means that the iterator needs a pointer to the collection/container
object in order to get that information.
-Craig
"Walter Bright" <newshound digitalmars.com> wrote in message
news:eioqvl$26t0$2 digitaldaemon.com...

Bill Baxter wrote:

I think the iterator-as-range idea is at least worth discussing. 99% of
the time you use an iterator you also need to know where to stop. It's
pretty similar to how arrays in D keep a length, because 99% of the time
when you have an array you need to know how long it is too. So it makes
sense to keep those two bits of information together. Similarly in C++,
when you have an iterator, you almost always need the end() too. And
any time you want to pass around iterators you end up having to pass the
two bits of information around separately.

When writing custom C++ iterators, I find that end() is not ever necessary.
If end() is not used, it means that a little more smarts have to be added to
the iterator itself so that the iterator knows when to stop. In some cases
this means that the iterator needs a pointer to the collection/container
object in order to get that information.

Also in some cases, 'end()' is really unnatural to write. E.g. with
cyclic containers.

necessary.
If end() is not used, it means that a little more smarts have to be =

added to
the iterator itself so that the iterator knows when to stop. In some =

cases
this means that the iterator needs a pointer to the collection/contain=

object in order to get that information.

A benefit of having a pointer to the iterated container is of course tha=
t =
boundary checks are always up to date, even if you change the container =
=
while iterating it. Well, I'm not saying that it would be the way to do =
=
the iterators, or that it's not the way to do them.
Btw, I have used (in C++) iterators having the following functions:
isEnd() //does the iterator point to the end position? (isNotEnd(=
) =
etc provided for convenience)
//there is also: isBegin(), isFirst(), isLast()
toBegin() //to position -1 (which is outside the container)
toFirst() //to position 0
toLast() //to position end - 1
toEnd() //the end position points outside the container
toNext()
toPrev()
get() //returns the current item
read() //=3D=3D get() + toNext()
getNext()
getPrev()
readBackward() //=3D=3D get() + toPrev()
...
I have named iterators capable of changing the container as 'Writer's =
('wr' + 'iter'). They have the following functions:
set()
write() //=3D=3D set() + toNext()
...
Ok, there are a lot of functions... but my actual point is that the =
iterator has two positions, 'begin' and 'end', that point *outside* the =
=
container. When writing (or setting) a value to 'end', the value is =
appended to the end of the container (very natural operation for files).=
=
The same way "i.toBegin(); i.set(val);" inserts 'val' to the start of th=
e =
container (very unnatural operation for files, and actually causes an =
error if tried for files).

In the general case the structure to be linearized is a directed
graph. Every linearization has to at least implicitely build a
spanning tree for that graph.
A priori in spanning trees the number of childs of a node is limited
only by the number of nodes of the structure.
Therefore every linearizator must be able to handle trees of
arbitrary degree.
From tree grammars it is known, that top down attributions and bottom
up attributions are incomparable to each other when only one run is
allowed.
Because the goal of any linearization is to compute some attribution
of the structure every 'traverser' must assist the process outlined
above, where in the general case more than one run is needed.
Now: what is efficiency in this general case?

Overload opIndex for rvalue access
[...]
Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to
defeat
much of the purpose.
(It might be possible to implement random access for a list, but it's not
something you'd want to encourage.)

Overload opIndex for rvalue access
[...]
Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists?
Seems to defeat
much of the purpose.

To work, all it has to do is support [0], and throw an exception if the
index is anything but 0.

(It might be possible to implement random access for a list, but it's not
something you'd want to encourage.)

I agree.

This is one point where there's a huge divide between C++ iterators and
Java iterators.
In C++ there is no such thing as *the* iterator type or interface. What
C++ (or STL, rather) has is collection of *iterator concepts*. What
that means is that if opIndex doesn't make sense for the the collection
in question, you just leave it out. A 'concept' is like an interface
but for templates. Here are the concepts in the STL:
Trivial Iterator - http://www.sgi.com/tech/stl/trivial.html
Input Iterator - http://www.sgi.com/tech/stl/InputIterator.html
Output Iterator - http://www.sgi.com/tech/stl/OutputIterator.html
Forward Iterator - http://www.sgi.com/tech/stl/ForwardIterator.html
Bidirectional Iterator -
http://www.sgi.com/tech/stl/BidirectionalIterator.html
Random Access Iterator
http://www.sgi.com/tech/stl/RandomAccessIterator.html
If you want your container to support forward iteration, then your
iterator can be of any type you wish, but it must support x++, ++x, and
*x operations.
The Java/C# model is instead based on predefined interfaces which has
pros and cons.
* Main pro is that functions that take generic iterators can be ordinary
functions rather than templates.
* Main con is that such functions can't generally be as efficient as raw
pointer manipulation, because you have to make a virtual method call
through the interface pointer to get each next element.
D is in a very interesting place in that it is high level enough that
the C++ model doesn't quite make sense, but low level enough that the
Java model doesn't quite make sense either.
D may be able to have the best of both worlds. Define *both* standard
concepts *and* standard interfaces that implement those concepts, and
then libraries can write either templatized algorithms for speed, or
regular function based algorithms for code-size, use with inheritance, etc.
--bb

Overload opIndex for rvalue access
[...]
Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists?
Seems to defeat
much of the purpose.

To work, all it has to do is support [0], and throw an exception if
the index is anything but 0.

(It might be possible to implement random access for a list, but it's
not
something you'd want to encourage.)

I agree.

This is one point where there's a huge divide between C++ iterators and
Java iterators.
In C++ there is no such thing as *the* iterator type or interface. What
C++ (or STL, rather) has is collection of *iterator concepts*. What
that means is that if opIndex doesn't make sense for the the collection
in question, you just leave it out. A 'concept' is like an interface
but for templates.

For what it's worth, C++98 style iterator traits won't work in D because
of D's simpler template lookup rules. I think we'll have to go with
something much closer to C++0x concept checking if we want to overload
algorithms for different iterator categories.

The Java/C# model is instead based on predefined interfaces which has
pros and cons.
* Main pro is that functions that take generic iterators can be ordinary
functions rather than templates.
* Main con is that such functions can't generally be as efficient as raw
pointer manipulation, because you have to make a virtual method call
through the interface pointer to get each next element.

Hrm... you could still use a Java design without the base class and have
everyone just use templates though.

D is in a very interesting place in that it is high level enough that
the C++ model doesn't quite make sense, but low level enough that the
Java model doesn't quite make sense either.
D may be able to have the best of both worlds. Define *both* standard
concepts *and* standard interfaces that implement those concepts, and
then libraries can write either templatized algorithms for speed, or
regular function based algorithms for code-size, use with inheritance, etc.

That's kind of what I was thinking, though if iterators all inherit from
a common base class then the calls will be virtual regardless, unless
the DMD optimizer gets a bit fancier.
Sean

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
.begin property returns an iterator that starts at the beginning
.end returns an iterator that is at the end
Overloading * doesn't work with D. But, instead:

What are the arguments against that, again? I do not recall, and it's
worth at least reviewing why we don't have opDeref and opDerefAssign.
As you know, Walter, a major benefit of C++-style iterators is that they
are semantically as close as possible to pointers. Would [] be changed
to dereference pointers? I sure hope not. If we go this route, I would
suggest adding these unary-* overloads.

Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this:
If a class provides .begin and .end, and these both return the same
type, and that type provides the requisite iterator methods, it is
foreach-able.
If a class provides .begin, .end, and opApply, one of the two iteration
methods has to take precedence. I would hope it's opApply.
For a type T, its associated iterator type should be available via
T.iterator. This has to be standard.
An iterable type T MUST provide these methods:
I begin()
I end()
An iterator type I MUST provide the following overloads:
E opSlice()
I opPostInc()
E is the type of an element in the collection.
These are enough to describe a read-only forward iterator, which is
adequate for foreach. In other words, given a type T that meets the
requirements outlined above, the following is adequate to iterate
through it:
foreach(e; t) {
// ...
}
'e' would be of type E.
We run into a curious problem involving pre-increment, which uses
opAddAssign. Classes for which random-access is problematic should not
be required to provide this, which is essentially a random-access
method. There are two options here:
1) Only require that iterators support opPostInc. (This further removes
them from actual pointers.)
2) Introduce a convention whereby a non-random-access iterator throws an
exception if any value other than 1 is passed to its opAddAssign overload.
Extending these rules for bidirectional iterators and random-access
iterators, as well as read-write iterators, is left as an exercise to
the reader. (And not a very difficult one.)
All of these strange edge-cases and differences between pointers and
possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR
D. D, unlike C++, does not supply enough operator overloads for their
semantics to be identical to those of pointers, which was the whole
point of C++-style iterators in the first place.
Instead, we should look to other languages for inspiration. I suggest
Java and Python, whose iterator semantics are similar. Other posters
have already made suggestions on this front. I would only suggest the
"opIter" method for getting an iterator, and that the iterator class
itself be available via T.iterator (whether it is a nested class or an
alias shouldn't matter). Classes can provide methods for returning
alternate iterators, as well. (A class providing opIter should be
iterated over as easily as an iterator itself.)
--
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org

Because D has many cases of implicit dereferencing, I thought it would
cause more trouble than it's worth.

I do not recall, and it's
worth at least reviewing why we don't have opDeref and opDerefAssign.
As you know, Walter, a major benefit of C++-style iterators is that they
are semantically as close as possible to pointers. Would [] be changed
to dereference pointers?

You've always been able to:
int* p;
p[1] = ...

I sure hope not. If we go this route, I would
suggest adding these unary-* overloads.

Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this:
If a class provides .begin and .end, and these both return the same
type, and that type provides the requisite iterator methods, it is
foreach-able.

Yes, though they will return a value, not a type.

If a class provides .begin, .end, and opApply, one of the two iteration
methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

For a type T, its associated iterator type should be available via
T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

An iterable type T MUST provide these methods:
I begin()
I end()

Yes.

An iterator type I MUST provide the following overloads:
E opSlice()

No. opIndex()

I opPostInc()

Probably opAddAssign()

E is the type of an element in the collection.

These are enough to describe a read-only forward iterator, which is
adequate for foreach. In other words, given a type T that meets the
requirements outlined above, the following is adequate to iterate
through it:
foreach(e; t) {
// ...
}
'e' would be of type E.

Yes.

We run into a curious problem involving pre-increment, which uses
opAddAssign. Classes for which random-access is problematic should not
be required to provide this, which is essentially a random-access
method. There are two options here:
1) Only require that iterators support opPostInc. (This further removes
them from actual pointers.)
2) Introduce a convention whereby a non-random-access iterator throws an
exception if any value other than 1 is passed to its opAddAssign overload.

I could go either way with that.

Extending these rules for bidirectional iterators and random-access
iterators, as well as read-write iterators, is left as an exercise to
the reader. (And not a very difficult one.)
All of these strange edge-cases and differences between pointers and
possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR
D. D, unlike C++, does not supply enough operator overloads for their
semantics to be identical to those of pointers, which was the whole
point of C++-style iterators in the first place.

I think it does provide enough, in fact, it will be less wacky than in
C++ (look at the wretched iterator/const_iterator dichotomy). It'll work
with core arrays, and will not need those typedefs.

Instead, we should look to other languages for inspiration. I suggest
Java and Python, whose iterator semantics are similar. Other posters
have already made suggestions on this front. I would only suggest the
"opIter" method for getting an iterator, and that the iterator class
itself be available via T.iterator (whether it is a nested class or an
alias shouldn't matter). Classes can provide methods for returning
alternate iterators, as well. (A class providing opIter should be
iterated over as easily as an iterator itself.)

Because D has many cases of implicit dereferencing, I thought it would
cause more trouble than it's worth.

I do not recall, and it's worth at least reviewing why we don't have
opDeref and opDerefAssign.
As you know, Walter, a major benefit of C++-style iterators is that
they are semantically as close as possible to pointers. Would [] be
changed to dereference pointers?

You've always been able to:
int* p;
p[1] = ...

Wouldn't that be p[0] = ...?
Is that how you intend to use opIndex to "dereference" the iterator?
That's /terrifying/. Although it is consistent with pointers, using [0]
for everything is just asking for trouble with regards to
non-random-access iterators. And though it's probably not totally
meaningless to a random reader of the code, it comes pretty darned close.

I sure hope not. If we go this route, I would suggest adding these
unary-* overloads.

Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this:
If a class provides .begin and .end, and these both return the same
type, and that type provides the requisite iterator methods, it is
foreach-able.

Yes, though they will return a value, not a type.

s/these both return the same type/their return types are the same/

If a class provides .begin, .end, and opApply, one of the two
iteration methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

It is slightly easier and clearer to explicitly say:
foreach(e; t.begin) { }
than
foreach(e; &t.opApply) { }
opApply is an operator overload. .begin would just be a special method.
I would expect the operator to take precedence.

For a type T, its associated iterator type should be available via
T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

Due to the way in which D's type inference from properties works, you'll
need typeof(T.begin()). Otherwise, it will derive the type of the method
itself. (Which isn't even a valid type, as such.)

An iterable type T MUST provide these methods:
I begin()
I end()

Yes.

An iterator type I MUST provide the following overloads:
E opSlice()

No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[]
would "dereference" the iterator). (A no-index opIndex doesn't work.)
This of course is inconsistent with pointers, so saying i[0] is better
on that count.

I opPostInc()

Probably opAddAssign()

Oh, there's another one:
int opEquals(I)

E is the type of an element in the collection.

These are enough to describe a read-only forward iterator, which is
adequate for foreach. In other words, given a type T that meets the
requirements outlined above, the following is adequate to iterate
through it:
foreach(e; t) {
// ...
}
'e' would be of type E.

Yes.

We run into a curious problem involving pre-increment, which uses
opAddAssign. Classes for which random-access is problematic should not
be required to provide this, which is essentially a random-access
method. There are two options here:
1) Only require that iterators support opPostInc. (This further
removes them from actual pointers.)
2) Introduce a convention whereby a non-random-access iterator throws
an exception if any value other than 1 is passed to its opAddAssign
overload.

I could go either way with that.

Neither one is particularly elegant.

Extending these rules for bidirectional iterators and random-access
iterators, as well as read-write iterators, is left as an exercise to
the reader. (And not a very difficult one.)
All of these strange edge-cases and differences between pointers and
possible iterator types convince me that C++-STYLE ITERATORS ARE NOT
FOR D. D, unlike C++, does not supply enough operator overloads for
their semantics to be identical to those of pointers, which was the
whole point of C++-style iterators in the first place.

I think it does provide enough, in fact, it will be less wacky than in
C++ (look at the wretched iterator/const_iterator dichotomy). It'll work
with core arrays, and will not need those typedefs.

Without the ability to overload the dereference operator, any attempt to
write a class that behaves like a pointer in D will be unwieldy, or at
least ugly. Admittedly, this isn't much of an issue when just using
foreach, but might be an issue with any kind of STL-like algorithms
library. (Which is what you're inviting when using C++'s iterator
semantics.)

Instead, we should look to other languages for inspiration. I suggest
Java and Python, whose iterator semantics are similar. Other posters
have already made suggestions on this front. I would only suggest the
"opIter" method for getting an iterator, and that the iterator class
itself be available via T.iterator (whether it is a nested class or an
alias shouldn't matter). Classes can provide methods for returning
alternate iterators, as well. (A class providing opIter should be
iterated over as easily as an iterator itself.)

I must have misunderstood you. I thought you were asking if [n] worked
on pointers. Yes, it does.

Is that how you intend to use opIndex to "dereference" the iterator?
That's /terrifying/. Although it is consistent with pointers, using [0]
for everything is just asking for trouble with regards to
non-random-access iterators. And though it's probably not totally
meaningless to a random reader of the code, it comes pretty darned close.

I agree it might be startling at first because it's unusual. Once one is
past that, however, is it that bad?

If a class provides .begin, .end, and opApply, one of the two
iteration methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

It is slightly easier and clearer to explicitly say:
foreach(e; t.begin) { }
than
foreach(e; &t.opApply) { }
opApply is an operator overload. .begin would just be a special method.
I would expect the operator to take precedence.

The:
foreach(e; t.begin) { }
would not be supported. It would be:
foreach (e; t) { }
If an aggregate had both, and iterators would be chosen by default,
overriding with:
foreach (e; &t.opApply) { }
would not require any new conventions or syntax. So I think this is the
right way to go.

For a type T, its associated iterator type should be available via
T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

Due to the way in which D's type inference from properties works, you'll
need typeof(T.begin()). Otherwise, it will derive the type of the method
itself. (Which isn't even a valid type, as such.)

Yes, you're right.

No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[]
would "dereference" the iterator). (A no-index opIndex doesn't work.)
This of course is inconsistent with pointers, so saying i[0] is better
on that count.

Ok, I understand now where you were coming from with the slice.

I opPostInc()

Probably opAddAssign()

Oh, there's another one:
int opEquals(I)

Yes, I'd overlooked that.

I could go either way with that.

True, but that inelegance is hidden away in the library, but the
flexibility is probably worth it.

I think it does provide enough, in fact, it will be less wacky than in
C++ (look at the wretched iterator/const_iterator dichotomy). It'll
work with core arrays, and will not need those typedefs.

write a class that behaves like a pointer in D will be unwieldy, or at
least ugly. Admittedly, this isn't much of an issue when just using
foreach, but might be an issue with any kind of STL-like algorithms
library. (Which is what you're inviting when using C++'s iterator
semantics.)

I still think it isn't necessary, but you might be thinking of a use
case I haven't. Can you provide an example?

I think it does provide enough, in fact, it will be less wacky than
in C++ (look at the wretched iterator/const_iterator dichotomy).
It'll work with core arrays, and will not need those typedefs.

to write a class that behaves like a pointer in D will be unwieldy, or
at least ugly. Admittedly, this isn't much of an issue when just using
foreach, but might be an issue with any kind of STL-like algorithms
library. (Which is what you're inviting when using C++'s iterator
semantics.)

I still think it isn't necessary, but you might be thinking of a use
case I haven't. Can you provide an example?

The only thing that comes to mind is if you get multiple indirections.
For instance, in C++ I frequently end up creating things like vectors of
lists or lists of vectors. Or vectors of pointers. Or sets of list
iterators. Then to actually use the things I need two dereferences.
But with built-in GC, you don't really need smart pointers so often. And
with "dot-is-all-you-need" that also eliminates many cases for dereferences.
But still I'm sure you could come up with situations where to get at the
thing pointed to you'd need a string of [0]'s
iteriteriter[0][0][0]
Actually, in some ways it's an improvement over the prefix deref
operator because it reads consistently from left to right. In c++ if
you had iterator to vector of iterator and you wanted to deref it you'd
need:
*((*iterveciter)[i])
Or maybe the inner parens are unnecessary. I can't recall, which is
another reason why using postfix for everything is better -- no question
about precedence rules.
So I'm sold on dereferencing being a postfix operation. But I would
still like it better if there were some way to get a compile-time error
if I try to do
iter[2] // Error! This iterator isn't random access!
Would introducing a special symbol be of any use? Like [*]?
iter[*]
It would just compile to opIndex(0), but it declares the intent "this is
dereferencing, not arbitrary indexing". I guess it's kind of pointless
if there's no compiler support for enforcing its use. I guess it would
only be useful if there were an opDeref.
Anyway, whatever you do with dereferencing -- I'm convinced you should
absolutely not make it a unary prefix operator. It should be postfix.
--bb

Somehow..
foreach(i; foo)
{
auto x = i[0];
i[0] = x + 1;
}
You often say that operator overloading should be used only what it should
be used for, and taking the index of an iterator means.. what exactly?
Would it be nicer to just use a .value property instead?
foreach(i; foo)
{
auto x = i.value;
i.value = x + 1;
}

Somehow..
foreach(i; foo)
{
auto x = i[0];
i[0] = x + 1;
}
You often say that operator overloading should be used only what it should
be used for, and taking the index of an iterator means.. what exactly?
Would it be nicer to just use a .value property instead?
foreach(i; foo)
{
auto x = i.value;
i.value = x + 1;
}

The problem is if the iterator is an actual pointer, there is no .value
property for a pointer.
Another possibility is to have:
*x
rewritten to:
x[0]
if x is not a pointer.

Somehow..
foreach(i; foo)
{
auto x = i[0];
i[0] = x + 1;
}
You often say that operator overloading should be used only what it
should be used for, and taking the index of an iterator means.. what
exactly?
Would it be nicer to just use a .value property instead?
foreach(i; foo)
{
auto x = i.value;
i.value = x + 1;
}

The problem is if the iterator is an actual pointer, there is no .value
property for a pointer.

Is there really any reason to support pointers as iterators though? C++
libraries even seem to be moving away from that towards more robust and
less error-prone iterator objects.
Sean

The problem is if the iterator is an actual pointer, there is no
.value property for a pointer.

Well, they could. It's up to Mr. Compiler Writer if a pointer has a
value property or not.

Is there really any reason to support pointers as iterators though? C++
libraries even seem to be moving away from that towards more robust and
less error-prone iterator objects.

Agreed. I don't think I've ever actually used a raw pointer with a
fancy STL algorithm. I was actually really surprised the first time I
saw it in the examples on the SGI STL page. Other than those examples,
I've never seen it in real code.
And that should go double for D, where you rarely even see pointers to
begin with.
--bb

The problem is if the iterator is an actual pointer, there is no
.value property for a pointer.

Well, they could. It's up to Mr. Compiler Writer if a pointer has a
value property or not.

Consider the following struct:
struct Foo
{
int value;
}
Foo* p;
p.value;
Is p.value the entire contents of Foo (as it would be for a proposed
.value property) or just Foo.value? "value" is a pretty common field name.

The problem is if the iterator is an actual pointer, there is no
.value property for a pointer.

Well, they could. It's up to Mr. Compiler Writer if a pointer has a
value property or not.

Consider the following struct:
struct Foo
{
int value;
}
Foo* p;
p.value;
Is p.value the entire contents of Foo (as it would be for a proposed
.value property) or just Foo.value? "value" is a pretty common field name.

It's a name conflict. They happen. I'd have trouble if I wanted to
have a sizeof member in my struct too. If you want to be sure to get
the struct's member then use p[0].value.
'Value' is too common, though. If you were going to go that route it
should definitely be something like .iterVal or some other word less
likely to conflict.
I'm not saying it's the right way to go, just that it could be made to
work if needed.
--bb

The problem is if the iterator is an actual pointer, there is no
.value property for a pointer.

Well, they could. It's up to Mr. Compiler Writer if a pointer has a
value property or not.

Consider the following struct:
struct Foo
{
int value;
}
Foo* p;
p.value;
Is p.value the entire contents of Foo (as it would be for a proposed
.value property) or just Foo.value? "value" is a pretty common field name.

Maybe that's why Ada used "`" as a property marker rather than
"."? Of course there's the Python approach...naming it __value__.
A suitable solution might be to name it "valueof". That's a
much less common field name. The problem is that this is
something used quite frequently, so one would want it to be
short. Perhaps "V"? Then one would write
struct Foo
{ int value;
}
Foo* p;
p.value;
p.V;
and have two distinct forms. (Yes, V is used occasionally as
a variable name...but pretty rarely.)

The problem is if the iterator is an actual pointer, there is no
.value property for a pointer.

Well, they could. It's up to Mr. Compiler Writer if a pointer has a
value property or not.

Is there really any reason to support pointers as iterators though?
C++ libraries even seem to be moving away from that towards more
robust and less error-prone iterator objects.

Agreed. I don't think I've ever actually used a raw pointer with a
fancy STL algorithm. I was actually really surprised the first time I
saw it in the examples on the SGI STL page. Other than those examples,
I've never seen it in real code.

I've used it a few times, but generally more when I'm throwing together
a quick test than writing actual production code. But since D arrays
aren't the same as C++ arrays and we have the option of getting a robust
object back (via a.begin() or whatever), why not take it?
Sean

Is there really any reason to support pointers as iterators though?
C++ libraries even seem to be moving away from that towards more
robust and less error-prone iterator objects.

Can you be more specific about what the problems and solutions are?

For one thing, if pointers still are supported, then all algorithms have
to be written so they can use them.
If we skip "real pointer compatibility", then we can decide on smarter
iterators.
Example:
With pointer-compatible iterators, it is impossible to traverse the tree.
With smarter iterators one could.

Is there really any reason to support pointers as iterators though?
C++ libraries even seem to be moving away from that towards more
robust and less error-prone iterator objects.

Can you be more specific about what the problems and solutions are?

A direct application of pointers as iterators couldn't be done exactly
the C++ way because C++ uses traits templates to describe the operations
an iterator supports. And D will not overload templates from different
modules, so the user could not provide overloads for his own iterators.
This isn't a huge problem however, as we could probably write a set of
generic traits templates that apply to classes/structs implemented the
'right' way rather than requiring the user to provide his own. This
would be roughly similar to how a user can derive from std::iterator<>
to add the required typedefs to his object in C++.
More generally though, I find the need to use two iterators to identify
a range to be somewhat cumbersome and annoying. And there have been
enough others who feel this way that there has been at least an informal
proposal for "all in one" iterators for C++. I haven't bothered to
check whether it was ever formalized though.
Basically, I'm wondering whether it might be easier and more D-like to
use iterators for stepping and opIndex for random access, instead of
iterators for everything. Since D has slicing, the argument for using
iterators to define the boundaries of a range of randomly accessible
elements seems kind of small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.
Sean

Since D has slicing, the argument for using
iterators to define the boundaries of a range of randomly accessible
elements seems kind of small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

I went ahead and wrote everything up during my commute this morning.
It's posted separately as "Iterator straw-man." And I apologize for it
being a bit longer than expected :-)
Sean

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

I agree completely. As someone with a background in Java, C#, and Python
(among others), but without significant C++ experience, my concept of an
iterator is: "an object with methods for getting the current element,
determining whether a 'next' element exists, and for advancing the
cursor to the next element".
The C++ iterators look very weird (and very very wrong) to me.
Search and sort routines should use opIndex, opIndexAssign, and opSlice.
If an object doesn't implement opIndex or opIndexAssign, it's not
sortable using generic sorting algorithms. Tough luck.
Iteration, though, is completely distinct from sorting and searching,
and should use a different mechanism.
Walter Bright wrote:

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

Perhaps an iterator always steps through an array (or an object
implementing a List or Iterable interface), and in order for an object
to be foreach-able, you'll need to first transform an object into an
array (or construct a List or an Iterable wrapping the object).
Arrays could still get their own special-case implementation of foreach,
avoiding interfaces for maximum speed. And, if a class supports opIndex,
then you could avoid making a virtual method call through the Iterable
interface.
But I think it's also desirable, in many many cases, to implement an
Iterator class, without supporting opIndex. The only methods in the
iterator would be hasNext() and next().
Foreach should support real arrays, classes with opIndex, and Java-style
iterators. Searching and sorting should only use opIndex, opIndex,
opCmp, and opEquals. Iterators should only be used for iteration.
Anyhoo, those are my two cents.
--Benji Smith

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

OOh, I like the direction this is heading.
Sean I don't have time to read the straw-man right now but that
definitely solves my recursive mergesort question.
And Benji's comment about simple iterators and ranges being distinct
things that shouldn't be lumped together also seems on target. If
iterators don't have to be ranges then it's simple to see how to support
iterators over infinite or circular containers. They just never stop
spitting out 'next()'s
But how do you handle generically pointing to an element then? Like you
have with an iterator to a node in linked list. I guess you can have
iterators with hasPrev() getPrev() (or --) type methods too.
--bb

But how do you handle generically pointing to an element then? Like you
have with an iterator to a node in linked list. I guess you can have
iterators with hasPrev() getPrev() (or --) type methods too.
--bb

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?
Things that need random access override opIndex and opIndexAssign.
Things which you can take a range of values from override opSlice and
opSliceAssign.
Things that require lock-step iteration override opApply, or supply a
function that returns a delegate that "runs" a foreach loop.
About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).
I know some people want to be able to stop iteration and resume it, but
couldn't that be solved by making opApply return a generator like what
Python does? Those things are written the same way as a current opApply
(sans all the return value from the delegate stuff), and can be advanced
manually.
I know this can be done. I wrote the coroutine module for StackThreads
a while back, and one of the *very first* things I did was implement
generic generators. The base generator class essentially just
implemented opApply switching over to the coroutine every time it needed
the next value. Here's a port of Python's range() function (well, the
one and zero argument versions :P):

Now, from that, we can implement opApply AND next/current in the base
class. Going with coroutines kills both the opApply and forward
iterator birds with one stone.
As for bi-directional iterators... I actually can't think of anywhere
I'd use them. I just keep thinking "I could just use random access to
the same end". Python's never suffered from not having them...
Anyway, just some random thoughts. Haven't had coffee yet :3
-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.
v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Close, but right now we don't have a good way to iterate over multiple
things at once.
for(i,j; i.hasNext()&&j.hasNext(); i++,j++)
{
// do something with current values of i and j
}
Or as you say to stop iteration or a generic way to return a pointer to
a particular spot in a container.

About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

[intersting stuff about generators and D]

If generators can handle the above cases *AND* do it with code that
simple to create and use *AND* do all that as efficiently as for loops,
then it sounds great to me. My impression was that any sort of
opApply/coroutine thing in D was not going to be so efficient.
--bb

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Close, but right now we don't have a good way to iterate over multiple
things at once.
for(i,j; i.hasNext()&&j.hasNext(); i++,j++)
{
// do something with current values of i and j
}
Or as you say to stop iteration or a generic way to return a pointer to
a particular spot in a container.

About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

[intersting stuff about generators and D]

If generators can handle the above cases *AND* do it with code that
simple to create and use *AND* do all that as efficiently as for loops,
then it sounds great to me. My impression was that any sort of
opApply/coroutine thing in D was not going to be so efficient.
--bb

Well, as I said, once you have your sequence expressed as a coroutine,
writing hasNext() and getCurrent() is trivial. Once you have that,
iterating multiple things in lockstep is also trivial:

Or something to that effect :P
As for efficiency, I'm not sure how to go about testing that so that the
results mean anything. I'm sure my current implementation could be made
faster, but I don't know how.
Just for loose comparison, the game EVE Online has the record for the
most number of people simultaneously online on a single server at once.
And it's servers are built using coroutines running in *Python*.
I suppose the bottleneck (if it is one) is how fast you can do
user-space context switches. Which is basically a function call, moving
the base of the stack, and switching over stuff like the exception
handling registers. Or somesuch... Mikola Lysenko wrote StackThreads;
my attempt only mostly worked :P
-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.
v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/

I suppose the bottleneck (if it is one) is how fast you can do
user-space context switches. Which is basically a function call, moving
the base of the stack, and switching over stuff like the exception
handling registers. Or somesuch... Mikola Lysenko wrote StackThreads;
my attempt only mostly worked :P

That's pretty much it. Saving the current register data, loading the
other register data, and jumping to a new instruction address. There's
no way this can be as efficient as an integer increment instruction (for
loop iteration) but it's orders of magnitude faster than a kernel thread
context switch.
Sean

Things that need random access override opIndex and opIndexAssign.
Things which you can take a range of values from override opSlice and
opSliceAssign.
Things that require lock-step iteration override opApply, or supply a
function that returns a delegate that "runs" a foreach loop.
About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

What about containers where there are multiple valid orders of
iteration? For a binary tree class, I might want to perform iteration
using depth-first, breadth-first, in-order, or reverse-order traversals.
How does the opApply/opApplyReverse solution address these needs?
And isn't a delegate-call about as computationally expensive as a
virtual method call (in the case of an Iterable iterface)?
If the delegate solution is really better than an Iterable interface,
why create special cases for opApply and opApplyReverse? Why not just
say that foreach always requires an iteration delegate?
--benji

Things that need random access override opIndex and opIndexAssign.
Things which you can take a range of values from override opSlice and
opSliceAssign.
Things that require lock-step iteration override opApply, or supply a
function that returns a delegate that "runs" a foreach loop.
About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

What about containers where there are multiple valid orders of
iteration? For a binary tree class, I might want to perform iteration
using depth-first, breadth-first, in-order, or reverse-order traversals.
How does the opApply/opApplyReverse solution address these needs?
And isn't a delegate-call about as computationally expensive as a
virtual method call (in the case of an Iterable iterface)?

It should be roughly equivelant in cost to explicitly calling the method,
meaning if the
method is virtual then you will get that cost, and if it is final then it is
like any
function call. Since Iterable methods will very likely almost always be
virtual, there is
at least a chance of delegates being a little cheaper /some of the time/ --
when they can
be pointers to final methods.

If the delegate solution is really better than an Iterable interface,
why create special cases for opApply and opApplyReverse? Why not just
say that foreach always requires an iteration delegate?
--benji

Well, when foreach(;) was first added to D, I don't think anyone had thought
about
delegates-as-aggregates yet. In fact, it was just added a couple of versions
back. So I
suppose in a sense, opApply*Reverse is just legacy -- though it makes it
straight-forward
to iterate simple, linear-only collections.
I suppose it would be safe enough to do away with opApply*Reverse now, but I
don't think
it hurts anything to leave them be, either.
-- Chris Nicholson-Sauls

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Essentially. But D has no formal description of an iterator type. I
think it would be a useful convention to establish, and doing so
requires determining what types of iterators should be supported, and how.

Things that need random access override opIndex and opIndexAssign.
Things which you can take a range of values from override opSlice and
opSliceAssign.
Things that require lock-step iteration override opApply, or supply a
function that returns a delegate that "runs" a foreach loop.
About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

This wouldn't be too difficult to do by adding a prev() method and
atBegin() or some such. Or maybe the Java style of hasNext(), next(),
hasPrev(), prev().

I know some people want to be able to stop iteration and resume it, but
couldn't that be solved by making opApply return a generator like what
Python does? Those things are written the same way as a current opApply
(sans all the return value from the delegate stuff), and can be advanced
manually.
I know this can be done. I wrote the coroutine module for StackThreads
a while back, and one of the *very first* things I did was implement
generic generators. The base generator class essentially just
implemented opApply switching over to the coroutine every time it needed
the next value. Here's a port of Python's range() function (well, the
one and zero argument versions :P):

Now, from that, we can implement opApply AND next/current in the base
class. Going with coroutines kills both the opApply and forward
iterator birds with one stone.

Interesting idea--I'll admit I don't have much experience with Python.
But I think both approaches have value. Using coroutines just to step
across two sequences simultaneously seems unnecessarily complex, and not
terribly efficient compared to a simple pointer increment. But it does
offer a tremendous amount of power for more complex cases.

As for bi-directional iterators... I actually can't think of anywhere
I'd use them. I just keep thinking "I could just use random access to
the same end". Python's never suffered from not having them...

I don't think I've ever needed them either, but they can come in handy
when implementing one container in terms of another. Say an iterable
stack and queue (unlikely, I know) that are implemented via a doubly
linked-list.
Sean

Since D has slicing, the argument for using iterators to define the
boundaries of a range of randomly accessible elements seems kind of
small to me. ie.
sort( a.begin, a.begin + 5 ); // sort the first 5 elements
or
sort( a[0 .. 5] );
I find the second to be cleaner to look at. But I'm undecided whether
we'd lose any power or flexibility by not bothering with random access
iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Essentially. But D has no formal description of an iterator type. I
think it would be a useful convention to establish, and doing so
requires determining what types of iterators should be supported, and how.

Things that need random access override opIndex and opIndexAssign.
Things which you can take a range of values from override opSlice and
opSliceAssign.
Things that require lock-step iteration override opApply, or supply a
function that returns a delegate that "runs" a foreach loop.
About the only thing this *doesn't* cover are bi-directional iterators
(say, iterating back and forth over an infinite series).

This wouldn't be too difficult to do by adding a prev() method and
atBegin() or some such. Or maybe the Java style of hasNext(), next(),
hasPrev(), prev().

I know some people want to be able to stop iteration and resume it, but
couldn't that be solved by making opApply return a generator like what
Python does? Those things are written the same way as a current opApply
(sans all the return value from the delegate stuff), and can be advanced
manually.
I know this can be done. I wrote the coroutine module for StackThreads
a while back, and one of the *very first* things I did was implement
generic generators. The base generator class essentially just
implemented opApply switching over to the coroutine every time it needed
the next value. Here's a port of Python's range() function (well, the
one and zero argument versions :P):

Now, from that, we can implement opApply AND next/current in the base
class. Going with coroutines kills both the opApply and forward
iterator birds with one stone.

Interesting idea--I'll admit I don't have much experience with Python.
But I think both approaches have value. Using coroutines just to step
across two sequences simultaneously seems unnecessarily complex, and not
terribly efficient compared to a simple pointer increment. But it does
offer a tremendous amount of power for more complex cases.

As for bi-directional iterators... I actually can't think of anywhere
I'd use them. I just keep thinking "I could just use random access to
the same end". Python's never suffered from not having them...

I don't think I've ever needed them either, but they can come in handy
when implementing one container in terms of another. Say an iterable
stack and queue (unlikely, I know) that are implemented via a doubly
linked-list.

I wrote some mesh (graph) manipulation algorithms in Python. I really
needed a linked list for that and a way to point to e.g. an edge in a
list of edges coming out of a vertex that wouldn't be invalidated by
inserting edges here and there before or after it, and I needed both to
find the previous edge and the following edge. In C++ std::list would
have been the clear choice.
Doing that with indexes into to a standard python list just isn't
practical (I started out trying it that way because Python FAQ's seem to
say "you don't need a linked list, just use a python list". But with a
python list, every time you insert one new item you have to go find
everyone out there that's holding onto an index into this list and get
them to update their values. Just not practical.
So sometimes bidirectional iterators are needed. Even in Python.
--bb

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with
opApply.

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with
opApply.

One such case is the usefulness of being able to provide an input
iterator to a parsing function, which may itself pass the iterator off
to other parsing functions.

It's becoming increasingly obvious that D needs iterators. While
opApply is far better for some kinds of iteration (such as
recursively traversing a directory), iterators are more efficient in
some cases, and allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with
opApply.

One such case is the usefulness of being able to provide an input
iterator to a parsing function, which may itself pass the iterator off
to other parsing functions.

Another case is the desire to iterate over multiple containers at the
same time, say as you would do in a sorted list merge operation.
opApply could handle this one if D had real coroutines (i.e. multiple
opApply's could be running simultaneously)
Another case is the basic desire to hold onto a pointer to a container
for later use. For instance if you have an STL style linked list
container where the implementation (the actual nodes) are hidden from
the user. You still want the user to be able to keep something that
functions like a pointer to a particular insertion point in the list
(and be able to use that in generic sequence manipulation algorithms).
--bb

One such case is the usefulness of being able to provide an input
iterator to a parsing function, which may itself pass the iterator off
to other parsing functions.

Some questions:
Is your data structure a graph or a tree ?
What does your iterator iterating ?
I ask because I currently investigate the visitor pattern to traverse
graph like data structures (see attached code for an example).

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with
opApply.

Mostly agreed. For example, one case I seem to find mentioned a few times is
providing
depth-first versus breadth-first iteration on a tree -- supposedly this is hard
to do with
foreach/opApply. While I do agree that opApply is not optimal (because you
can't tell
opApply which way you want to go) I thought this was part of the utility of the
recent
delegates-as-aggregates feature? The following is just off the very top of my
head, so it
might not be ideal/perfect, but I do believe it provides this ability, using
current D:
<code>
struct Tree(int B = 2, T) {
version (Tree_NoWarnings) {}
else {
static if (B == 0) {
static assert(false, "A tree with zero branches is just useless.");
}
else static if (B == 1) {
static assert(false, "A tree with only one branch may as well be an
array.");
}
}
Tree*[B] children ;
T value ;
int walkDepth (int delegate(inout Tree!(T)) dlg) {
int result ;
foreach (x; children) {
if (x && result = x.walkDepth(dlg)) {
goto end;
}
}
if (result = dlg(this)) {
goto end;
}
end:
return result;
}
int walkBreadth (int delegate(inout Tree!(T)) dlg) {
int result ;
Tree!(T)*[] gen = [this] ,
next ;
while (gen.length) {
foreach (x; gen) {
if (result = dlg(x.value)) {
goto end;
}
foreach (y; x.children) {
if (y) {
next ~= y;
}
}
}
gen = next.dup;
next.length = 0;
}
end:
return result;
}
}
Tree!(2, int) binary_int_tree ;
// ...
foreach (node; &binary_int_tree.walkDepth) {
// ...
}
foreach (node; &binary_int_tree.walkBreadth) {
// ...
}
</code>
Unless I'm missing something entirely, I don't see any major utility in this
case for
having a seperate Iterator entity...
-- Chris Nicholson-Sauls

It's becoming increasingly obvious that D needs iterators. While
opApply is far better for some kinds of iteration (such as
recursively traversing a directory), iterators are more efficient in
some cases, and allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with
opApply.

Mostly agreed. For example, one case I seem to find mentioned a few
times is providing depth-first versus breadth-first iteration on a tree
-- supposedly this is hard to do with foreach/opApply.

Other way around. Tree traversal is harder to do with C++ style
iterators, easy to do with opApply. I assume your code is correct but
didn't look at it too closely.
--bb

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively traversing
a directory), iterators are more efficient in some cases, and allow for
things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable
So here's one design:
.being property returns an iterator that starts at the beginning
.end returns an iterator that is at the end
Overloading * doesn't work with D. But, instead:
Overload opIndex for rvalue access
Overload opIndexAssign for lvalue access
Overloading opIndex also will work for random access
foreach loops will not be able to have a 'key' parameter.

It's not a huge problem, but passing an argument needlessly is both ugly and
slightly inefficient. Perhaps opIndex and opIndexAssign could be overloaded
without any parameters.
-Craig

It's becoming increasingly obvious that D needs iterators. While opApply
is far better for some kinds of iteration (such as recursively
traversing a directory), iterators are more efficient in some cases, and
allow for things that opApply makes difficult.
So hear are the design goals:
1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable

I apologize for this long post. Writing too much has become a bad habit
of mine lately it seems. This took a while writing, and in the meantime
new posts appeared that say much of the same. If you don't bother
reading this, read Sean's posts instead, as I agree 100 % with him on
this. :)
What I suggest is that we should not try to copy the C++ "iterator ==
pointer" idiom, but instead aim for an iterator design that is
self-contained and simple.
For me, the iterator concept is simple. An iterator is an entity that
allows an iterative way of accessing data. An iterator needs:
1. a way to reference the data
2. a way to traverse the data
3. a way to signal the end of the data
(In C++, many iterators lack #3 above, which leads to the necessity of a
second iterator that just signals the end of the iterative range.)
An iterator could also be a generator.
Here are some examples (translated to D) of how iterators are
implemented in some languages other than C++:
Java
----
class Iterator(T) {
bool hasNext(); // self explanatory
T* next(); // steps and returns reference to next element
}
while(i.hasNext())
writefln(*i.next());
C# (Simplified, from David Gileadi)
-----------------------
class Iterator(T) {
bool MoveNext(); // false if no more elements
T Current(); // property getter
}
while(i.MoveNext)
writefln(i.Current);
Python
------
class Iterator(T) {
T next();
}
while(true)
try
writefln(i.next);
catch (StopIteration)
break;
Taking inspiration from the above and following Walter's design goals
above, here is my suggestion:
interface Iterator(T) {
bool step(); // as C# MoveNext
T value; // getter (rvalue access)
T value(T); // setter (lvalue access)
int opApply(...) // support foreach style iteration
}
An alternative is supplying a value property that returns a pointer to
the element, but that would not allow limiting lvalue access for read
only iterators and introduces a pointer, which might be bad by itself.
Usage:
// explicit iteration
while(i.step) {
writefln(i.value);
i.value = 5;
}
// implicit iteration
foreach(o; i)
writefln(o);
I will briefly explain my reasoning:
As I said, I believe the iterator should be clean and simple. Saying
that it is OK for the iterator implementations to be dirty and messy,
but that the implementations will be hidden away in libraries and only
be the headache of library writers will lead to just that - only library
writers will write iterators.
As Sean hints at, "random access iterators" are not pure iterators. The
random access part is orthogonal to the iterative part. In D, a perfect
way to implement a random access view is defining a .length property and
an opIndex method. A random access view doesn't need to be iterable and
an iterator doesn't need to provide random access. A D slice is the
perfect example of a random access view.
I see no reason to use operator overloading when the result isn't
replaceable with a pointer anyway.
Bill Baxter's points regarding iterator as a concept vs an
interface/abstract class are very valid. And as Bill says, maybe both
should be supported. One of them could always be converted to the other.
Regarding efficiency, here is one sample implementation that a compiler
should(*) be able to make just as efficient as for/foreach loops:
struct Iterator(T) {
T* ptr;
T* end;
bool step() { return ++ptr != end; }
T value() { return *ptr; }
T value(T x) { return *ptr = x; }
// mixin OpApplyMixin;
}
Iterator!(T) iter(T)(T[] x) {
Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE
ret.ptr = x.ptr-1;
ret.end = x.ptr+x.length;
return ret;
}
void main() {
int[] arr = [1,2,3,4,5];
auto i = arr.iter();
while(i.step) {
writefln("%s",i.value);
i.value = 5;
}
}
/Oskar

As Sean hints at, "random access iterators" are not pure iterators. The
random access part is orthogonal to the iterative part. In D, a perfect
way to implement a random access view is defining a .length property and
an opIndex method. A random access view doesn't need to be iterable and
an iterator doesn't need to provide random access. A D slice is the
perfect example of a random access view.

You two might be right.

I see no reason to use operator overloading when the result isn't
replaceable with a pointer anyway.

Right.

Bill Baxter's points regarding iterator as a concept vs an
interface/abstract class are very valid. And as Bill says, maybe both
should be supported. One of them could always be converted to the other.

It's good, but the pointer-before-the-start is problematic. This can
cause problems on some architectures. Pointing past the end is
explicitly supported, but before the beginning is not. I'd also prefer
the step() get 'stuck' at the end, instead of going past it if it gets
called again (this makes sense for input iterators). So I suggest instead:
for (auto i = arr.iter(); !i.atEnd(); i.step())
...
in other words, separate out the 'end' test from the 'step' operation.

It's good, but the pointer-before-the-start is problematic. This can
cause problems on some architectures. Pointing past the end is
explicitly supported, but before the beginning is not. I'd also prefer
the step() get 'stuck' at the end, instead of going past it if it gets
called again (this makes sense for input iterators). So I suggest instead:
for (auto i = arr.iter(); !i.atEnd(); i.step())
...
in other words, separate out the 'end' test from the 'step' operation.

For arrays, we already have arr.ptr, what if array.end was simply added?
auto slice = array[50..100];
for(auto i = slice.ptr; i != slice.end; i++) { ... }
Then for UDT custom iterators, ptr, end and op[Pre|Post][Inc|Dec] could be
overloaded?
Just a thought..