I just implemented a singly-linked list type to illustrate the container
abstraction.
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
One interesting aspect is that SList must cooperate with Take. More
details are to be found in code, but essentially SList ranges are all
sentinel-terminated, which makes them "right-bound". If you want to only
remove a few elements from the middle of a list, you need to construct a
range that spans a limited portion of the list. I encoded that by using
the Take abstraction in std.range.
Please let me know of how you find it. There are a few refinements and
ancillary functions to define, but those are pretty clear given the rest.
Andrei

I just implemented a singly-linked list type to illustrate the container
abstraction.
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
One interesting aspect is that SList must cooperate with Take. More
details are to be found in code, but essentially SList ranges are all
sentinel-terminated, which makes them "right-bound". If you want to only
remove a few elements from the middle of a list, you need to construct a
range that spans a limited portion of the list. I encoded that by using
the Take abstraction in std.range.
Please let me know of how you find it. There are a few refinements and
ancillary functions to define, but those are pretty clear given the rest.

Well, one thing I don't like about it is that you cannot do O(1) insertion
or removal. insertBefore requires O(n) complexity and insertAfter
requires O(m) complexity. Removal will require O(n) complexity, even
though it doesn't exist right now.
Fast removal and insertion are key to having a linked list. I'd suggest
modifying the range so it uses a Node ** instead, which points to the
_next element of the previous node of the one currently being iterated, or
_root if it's the first node in the list.
-Steve

I just implemented a singly-linked list type to illustrate the
container abstraction.
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
One interesting aspect is that SList must cooperate with Take. More
details are to be found in code, but essentially SList ranges are all
sentinel-terminated, which makes them "right-bound". If you want to
only remove a few elements from the middle of a list, you need to
construct a range that spans a limited portion of the list. I encoded
that by using the Take abstraction in std.range.
Please let me know of how you find it. There are a few refinements and
ancillary functions to define, but those are pretty clear given the
rest.

Well, one thing I don't like about it is that you cannot do O(1)
insertion or removal. insertBefore requires O(n) complexity and
insertAfter requires O(m) complexity. Removal will require O(n)
complexity, even though it doesn't exist right now.
Fast removal and insertion are key to having a linked list. I'd suggest
modifying the range so it uses a Node ** instead, which points to the
_next element of the previous node of the one currently being iterated,
or _root if it's the first node in the list.

In fact, I'd say that it would be critical to split the insert and remove
functions to slow (O(n) or greater) versions and fast( O(lg(n) or less))
functions. Some algorithms may depend on fast insertion and removal, such
as mergesort on a linked list.
-Steve

I just implemented a singly-linked list type to illustrate the
container abstraction.
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
One interesting aspect is that SList must cooperate with Take. More
details are to be found in code, but essentially SList ranges are all
sentinel-terminated, which makes them "right-bound". If you want to
only remove a few elements from the middle of a list, you need to
construct a range that spans a limited portion of the list. I encoded
that by using the Take abstraction in std.range.
Please let me know of how you find it. There are a few refinements and
ancillary functions to define, but those are pretty clear given the
rest.

Well, one thing I don't like about it is that you cannot do O(1)
insertion or removal. insertBefore requires O(n) complexity and
insertAfter requires O(m) complexity. Removal will require O(n)
complexity, even though it doesn't exist right now.
Fast removal and insertion are key to having a linked list. I'd
suggest modifying the range so it uses a Node ** instead, which points
to the _next element of the previous node of the one currently being
iterated, or _root if it's the first node in the list.

In fact, I'd say that it would be critical to split the insert and
remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or
less)) functions. Some algorithms may depend on fast insertion and
removal, such as mergesort on a linked list.

Actualy mergesort could not be implemented without some sort of way to
restructure the list without allocating new nodes, requiring knowledge of
the link structure. I'm not sure it could be generic... I'll think about
this a bit.
-Steve

On 05/27/2010 08:23 AM, Steven Schveighoffer wrote:
You're making a number of great points in the four posts starting with
this. Since they sort of augment one another, let me quote and answer
them all here.

Well, one thing I don't like about it is that you cannot do O(1)
insertion or removal. insertBefore requires O(n) complexity and
insertAfter requires O(m) complexity. Removal will require O(n)
complexity, even though it doesn't exist right now.
Fast removal and insertion are key to having a linked list. I'd
suggest modifying the range so it uses a Node ** instead, which
points to the _next element of the previous node of the one currently
being iterated, or _root if it's the first node in the list.
-Steve

The performance profile of SList is what one would expect for a
straightforward singly-linked list implemented with nodes, pointer to
root, and the such. You are making a good point that adding one 1-2
extra indirections would lead to improvements of certain primitives.
There have been various experiments with STL-like slist implementations
that dwell on such ideas. See e.g.
http://manpages.ubuntu.com/manpages/lucid/man3/std::forward_list.3cxx.html
with functions like before_begin() in addition to begin(), and also
insert_after() instead of insert().
It has become clear to STL implementors that adding hidden indirections
puts forward_list at a disadvantage compared to the straightforward
lists against which they compete. I want to avoid that by abiding to the
straight storage strategy and deal with its consequences.
Second post:

In fact, I'd say that it would be critical to split the insert and
remove functions to slow (O(n) or greater) versions and fast( O(lg(n)
or less)) functions. Some algorithms may depend on fast insertion
and removal, such as mergesort on a linked list.

This is a good idea, which generalizes linearRemove(). I'd suspected if
there's one place to distinguish linear from better, then there ought to
be more.
From here we have a fork in the road:
a) Tighten complexity requirements for e.g. insertBefore(Range, Stuff)
and let containers that can't implement them just not have them.
b) Define linearInsertBefore(Range, Stuff) etc.
I'm leaning towards (a) because a list can insert virtually anywhere by
only using insertAfter(Take!Range, Stuff) (which I haven't defined yet
but I will, and which also takes care of the complexity issue) and
insertFront(Stuff).
Third post:

Actualy mergesort could not be implemented without some sort of way
to restructure the list without allocating new nodes, requiring
knowledge of the link structure. I'm not sure it could be
generic... I'll think about this a bit.

You are entirely right. A mergesort that relies on link shuffling needs
to have intimate knowledge of the node structure. It would most likely
be a member of the list.
Fourth post:

I have another general question in general about these collections.
Ranges fix a very bad problem with iterators -- non-matching begin
and end iterators. For example, passing a begin from one list and
and end from another.

Right.

However, all your range functions such as remove, insertBefore, etc.
are called via the container type, using a range as an argument. But
there can be no static verification that a range actually is part of
a given container. Should there be some requirement that a container
validates that a range is part of the container before performing the
operation? This is the case for dcollections.
In your current implementation of SList, verification seems
incidental for insertBefore because you must find the previous node
from the root anyways (and that will throw on enforce if it cannot
find it). But insertAfter does not, so something like this will
compile, run, and result in strange behavior:
auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this
correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8,
9, 10);
auto r1 = sl1[]; r1.popFront(); r1.popFront();
auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts
into sl1

It's a good question.
My current view of the matter is to maximize library flexibility and
versatility. Following the adage that you can build safe on fast but not
the other way around, I'd go with the following philosophy: checking of
range membership to their collections should be made only to the extent
of ensuring memory safety. Beyond that, if efficiency is in tension with
verifiability (as is the case with your example above), leave it to the
programmer or the supra-structure to correctly pair collections with
their ranges.
Sounds good?
Andrei

I just implemented a singly-linked list type to illustrate the container
abstraction.
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
One interesting aspect is that SList must cooperate with Take. More
details are to be found in code, but essentially SList ranges are all
sentinel-terminated, which makes them "right-bound". If you want to only
remove a few elements from the middle of a list, you need to construct a
range that spans a limited portion of the list. I encoded that by using
the Take abstraction in std.range.
Please let me know of how you find it. There are a few refinements and
ancillary functions to define, but those are pretty clear given the rest.

I have another general question in general about these collections.
Ranges fix a very bad problem with iterators -- non-matching begin and end
iterators. For example, passing a begin from one list and and end from
another.
However, all your range functions such as remove, insertBefore, etc. are
called via the container type, using a range as an argument. But there
can be no static verification that a range actually is part of a given
container. Should there be some requirement that a container validates
that a range is part of the container before performing the operation?
This is the case for dcollections.
In your current implementation of SList, verification seems incidental for
insertBefore because you must find the previous node from the root anyways
(and that will throw on enforce if it cannot find it). But insertAfter
does not, so something like this will compile, run, and result in strange
behavior:
auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this correct,
but it's not important
auto sl2 = make(SList!int, 6, 7, 8, 9, 10);
auto r1 = sl1[];
r1.popFront();
r1.popFront();
auto r2 = take(r1, 2);
sl2.insertAfter(r2, 666); // actually inserts into sl1
-Steve

One thing worries me a bit, but may be based on a misunderstanding
of how D's GC works. In dup, you allocate all the nodes at once as
an array and then go through them one by one setting their payload_
and next_ values to make the list.
My worry is that the array is allocated as a single block and
(presumably) GC'd as a single block. If I copy a 1000 element list
and then remove 999 elements from the front using removeFront(),
won't I end up leaking memory for the first 999 nodes?
Cheers,
Pillsy

One thing worries me a bit, but may be based on a misunderstanding
of how D's GC works. In dup, you allocate all the nodes at once as
an array and then go through them one by one setting their payload_
and next_ values to make the list.
My worry is that the array is allocated as a single block and
(presumably) GC'd as a single block. If I copy a 1000 element list
and then remove 999 elements from the front using removeFront(),
won't I end up leaking memory for the first 999 nodes?

There are indeed tradeoffs associated with allocation of nodes that way.
Technically that's not a leak, but indeed the entire block will stay put
for as long as at least one node in it it's used.
One improvement would be to put removed nodes in a static freelist to be
used by all SLists of the same type in the current thread. In that case,
remove becomes unstable.
I think I hastened a bit with that optimization, it's debatable and
detracts from the main point of the discussion.
Andrei

Another update:
http://erdani.com/d/phobos/std_container.htmlhttp://erdani.com/d/phobos/container.d
I simplified the implementation (no more array allocation etc.),
eliminated replace() and insertBefore() after convincing myself they are
not a good fit for lists, and added linearRemove(Take!Range).
Take!Range seems to be the secret recipe for an enjoyable SList experience.
I'll follow up later with an Array sample implementation. Then on the
radar are an associative container (probably a straight hash or a binary
tree) and then TightSList and TightArray which use deterministic storage
management.
I figured out a lot of stuff, and it seems to settle together quite nicely.
Andrei

I take it that Array is basically supposed to be std.container's version of
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
the best of names in that it would become very easy to confuse it with
built-in arrays when discussing them (particularly in verbal communication
where you can't see the capital A). Personally, I would have just gone with
Vector, since it's a fairly standard name for that sort of container and
wouldn't be confused with anything else. If you really want Array, that's
fine - it should be clear enough when it's in code - but I would think that
talking about it could easily be confusing enough that it wouldn't be the
best of names.
- Jonathan M Davis

wouldn't be confused with anything else. If you really want Array, that's
Personally, I would have just gone with Vector, since it's a fairly

standard name for that sort of container and wouldn't be confused with
anything else.
I take it you don't work in simulation or games, then? Don't do much linear
algebra?
While I understand the reasoning for it, I dislike the name vector for
arrays. A vector to me, is a geometric object with a length and magnitude,
not a random collection of whatevers.
--
Simen

I take it you don't work in simulation or games, then? Don't do much
linear algebra?
While I understand the reasoning for it, I dislike the name vector for
arrays. A vector to me, is a geometric object with a length and magnitude,
not a random collection of whatevers.

Vector is arguably a poor name because of that. However, it is, at this
point, a very standard name for such a container. At minimum, both C++ and
Java have a vector class, though the main one that you'd use in Java at this
point would be ArrayList rather than Vector. So, if a better name can be
found, that's great, but Vector is quite a standard name for the concept at
this point, and I do think that it works better than Array. I also don't
like the name ArrayList because I wouldn't really consider a vector to be a
list, but that's another option. I'm not all that familiar with what other
languages have used for such a class, but there are probably other pre-
existing options out there as well.
- Jonathan M Davis

I take it that Array is basically supposed to be std.container's version of
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
the best of names in that it would become very easy to confuse it with
built-in arrays when discussing them (particularly in verbal communication
where you can't see the capital A). Personally, I would have just gone with
Vector, since it's a fairly standard name for that sort of container and
wouldn't be confused with anything else. If you really want Array, that's
fine - it should be clear enough when it's in code - but I would think that
talking about it could easily be confusing enough that it wouldn't be the
best of names.
- Jonathan M Davis

I take it that Array is basically supposed to be std.container's version of
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
the best of names in that it would become very easy to confuse it with
built-in arrays when discussing them (particularly in verbal communication
where you can't see the capital A). Personally, I would have just gone with
Vector, since it's a fairly standard name for that sort of container and
wouldn't be confused with anything else. If you really want Array, that's
fine - it should be clear enough when it's in code - but I would think that
talking about it could easily be confusing enough that it wouldn't be the
best of names.
- Jonathan M Davis

Good point. Other opinions?

I always thought vector was a bad name choice in C++, I had that word
associated with what C++ calls a valarray (a physics vector).
I agree that Array is too generic for D, though, and unfortunately I don't
have a better suggestion, but you asked for other opinions ;)
BTW, what would be the point of an array/vector when you have built-in
arrays? If built-in arrays would be syntax sugar for a real library type,
like AAs, I can see as a good option using Array for that type, since
built-in arrays and the library Array would be the same thing.
--
Leandro Lucarella (AKA luca) http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
DETIENEN A PADRE, MADRE, TIOS Y ABUELOS: TODOS DEPRAVADOS
-- Crónica TV

I take it that Array is basically supposed to be std.container's version of
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
the best of names in that it would become very easy to confuse it with
built-in arrays when discussing them (particularly in verbal communication
where you can't see the capital A). Personally, I would have just gone with
Vector, since it's a fairly standard name for that sort of container and
wouldn't be confused with anything else. If you really want Array, that's
fine - it should be clear enough when it's in code - but I would think that
talking about it could easily be confusing enough that it wouldn't be the
best of names.
- Jonathan M Davis

Good point. Other opinions?

I always thought vector was a bad name choice in C++, I had that word
associated with what C++ calls a valarray (a physics vector).
I agree that Array is too generic for D, though, and unfortunately I don't
have a better suggestion, but you asked for other opinions ;)
BTW, what would be the point of an array/vector when you have built-in
arrays? If built-in arrays would be syntax sugar for a real library type,
like AAs, I can see as a good option using Array for that type, since
built-in arrays and the library Array would be the same thing.

This is a good question. The relationship between Array and T[] is simple:
* Array is a compliant container so it is a reference type
* T[] is Array's range
With built-in associative arrays the question becomes even more
interesting because the associative array _is_ already a reference type.
So Walter and I agreed to make built-in associative arrays just
compliant std.container objects, with three differences:
* AssocArray lives in object.di for various reasons
* The compiler translates the type name V[K] as AssocArray!(K, V)
* The compiler translates associative array literals into associative
array objects
Other than that... an associative array will be just one of the growing
offering of collections in std.container. And I think that's the way it
should be.
Andrei

BTW, what would be the point of an array/vector when you have built-in
arrays? If built-in arrays would be syntax sugar for a real library type,
like AAs, I can see as a good option using Array for that type, since
built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its
size/length. You can efficiency insert elements at the end with a vector -
amortized constant time usually - but you can't do that with a built-in
array because it would have to reallocate every time. D's arrays are
fantastic, but they're still not quite good enough to outright replace a
vector type.
On top of that, of course, there's the issue of the API used to interact
with it, which is what we're getting with std.container, but that could
probably be gotten around in much the same way as the range stuff was with
std.array. Really, the big difference is that you can efficiently append to
a vector type while you cannot do that with an array. The array is still a
tad too low level. And since you do need that low levelness at least some of
the time (particularly in a systems language), it's not like you could
replace arrays with vectors or make them act more like vectors. Both types
have their uses.
- Jonathan M Davis

The biggest difference is that a vector has capacity separate from its
size/length. You can efficiency insert elements at the end with a vector -
amortized constant time usually - but you can't do that with a built-in
array because it would have to reallocate every time. D's arrays are
fantastic, but they're still not quite good enough to outright replace a
vector type.

I can think about the opposite too: to remove the built-in dynamic arrays and
replace them as it's being done with associative arrays, keeping only the
syntax sugar and mapping them to the Array/Vector. Is this a bad idea? (this is
only about dynamic arrays, at the moment built-in fixed-sized arrays have to be
kept).
Bye,
bearophile

BTW, what would be the point of an array/vector when you have built-in
arrays? If built-in arrays would be syntax sugar for a real library type,
like AAs, I can see as a good option using Array for that type, since
built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its
size/length. You can efficiency insert elements at the end with a vector -
amortized constant time usually - but you can't do that with a built-in
array because it would have to reallocate every time. D's arrays are
fantastic, but they're still not quite good enough to outright replace a
vector type.

BTW, what would be the point of an array/vector when you have built-in
arrays? If built-in arrays would be syntax sugar for a real library
type,
like AAs, I can see as a good option using Array for that type, since
built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its
size/length. You can efficiency insert elements at the end with a vector
-
amortized constant time usually - but you can't do that with a built-in
array because it would have to reallocate every time. D's arrays are
fantastic, but they're still not quite good enough to outright replace a
vector type.

This isn't exactly true. D's builtin arrays also are amortized constant
time to append, but the constant is way higher :)
I agree that an array-like container would provide features that D's
builtin arrays do not, including much better append performance.
-Steve