Extensible vector in both ends?

This is a discussion on Extensible vector in both ends? within the C++ Programming forums, part of the General Programming Boards category; With vector you can push_back and pop_back elements, and you can resize it to add or remove multiple elements in ...

Extensible vector in both ends?

With vector you can push_back and pop_back elements, and you can resize it to add or remove multiple elements in the end of the vector. But you can't puch_front or pop_front elements. Neither can you extend the vector in the start if you'd like to (or at least it will take much longer time). It's a pity, cause now I need to do that. Is there some other container with which you can do that? Do I have to create an own container class?

With vectors, you have the erase and insert member functions, though they are more expensive than push_back and pop_back. Also consider either a list or a deque; each has efficiency advantages and disadvantages for various operations. If you are mainly concerned with efficiently adding/deleting elements at either end rather than in the middle, and you want efficient subscripting, a deque is probably best.

But don't forget the fundamental rules of the containers:
Rule #1: Use a vector.
Rule #2: If you think you need something different, use a vector.
Rule #3: If profiling proves that a vector is bad, then consider using a list or deque.

I don't understand the strong preference for a vector. It doesn't require profiling to know that a vector won't allow efficiently adding or deleting elements at the beginning. Even if it's not noticeable for a small vector, it won't scale up, not to mention that the choice of container documents what the needs of the program are.

It's pretty simple: vector has the best cache coherency. Unless copying of the individual elements is expensive (std::string is one case - but not once we get r-value references in C++09), it takes a lot of elements to overcome the considerably greater overhead that, say, a list has. When you're dealing with primitives, it's most noticeable. On a 64-bit system, for example, a list<int> needs 5 times as much space to store a single element as a vector. Iteration through a vector is faster by orders of magnitude than through a list (despite both claiming to be O(N)), because a list's nodes may well be all over the place and cache coherency is catastrophic, and also because prefetching is completely impossible.

What about a deque vs. a vector? I know a deque isn't guaranteed to have contiguous storage like a vector is, but isn't it implemented using something like a relatively small number of contiguous chunks of memory?

What about a deque vs. a vector? I know a deque isn't guaranteed to have contiguous storage like a vector is, but isn't it implemented using something like a relatively small number of contiguous chunks of memory?

I wanted to make this an edit to my previous post, but you posted in the meantime. I'll respond to your question at the end.

Take, for example, a game like space invaders, where you need to store bullets. Let's look at the characteristics of access, shall we?
1) Every frame, we iterate through all bullets and move them.
2) If a bullet hits something or goes out of the screen, it is removed.
3) New bullets are added to the end.

OK, so we've got linear access, removal from the beginning and middle rather than the end (because older bullets wander to the front), and insertion at the end. Sounds like a job for a list, does it not?

Well, let's see. You need to iterate through the bullets 30 times per second to keep up the frame rate, whereas you may have bullets being deleted less often - depending on the number of bullets flying around, perhaps 10 per second, perhaps only 2. In other words, a slow iteration will completely make up for any effects fast removal might have. And don't forget that insertion occurs exactly as often as removal, so faster insertion pays off, too.

Vector: Iteration is advancing of pointers. Computers are good at that. Elements are fetched from RAM into the cache in advance, because the predictor expects them to be accessed and knows where they are: at the next higher address range. RAM latency is perhaps the most important optimization factor, once you've got complexity out of the way. It is so important that it can easily trump, say, the difference between O(N) and O(N*logN) for any but the largest Ns.
Insertion is copying of a bit of memory. If the vector runs out of space, it means allocating a new block of memory and copying a bit more. Because a vector should always double required space on reallocation, it's at most logN reallocations. It makes for amortized constant time. Because in our scenario bullets come and go, there is something of an upper limit to the number of bullets, so after an initial growing phase, it actually is really constant time. And because the programmer can make a test run and see how large the vector grows when there's a lot going on in the game, and because he can then place a reserve() call at the beginning, it is constant time from the word go.
Erasure is linear time, element-wise copying of memory until it has all shuffled into position. On the other hand, it is still cache-coherent, similar to iteration.

List: Iteration is linear time, but it is bouncing from pointer to pointer. Cache coherency is low. Even if some nodes are accidently allocated next to each other, given that there are 2 pointers + heap management overhead there, you can fit very few nodes in a cache line, so you will very often have to wait for RAM to catch up. Because the location of the next node is impossible to know until you've fetched the current one, the predictor is useless.
Insertion: Allocation of a new node, linking it to the old one, copying the data. Constant time, but the allocation makes this hopelessly slower than vector. A pool allocator can be used to somewhat reduce the overhead. It is also the only way to pre-allocate memory.
Erasure: quite quick, since you discover it during iteration and it doesn't need shifting of elements. Still, unless you use a pool allocator, you've got heap management overhead on deleting. Oh, and did I mention the heap fragmentation you get if you don't use a pool allocator?

So, which is faster? My bet is on the vector, but I'm not entirely sure. That's why there's rule #3. What I do know is that the "rules" (linear vs random-access traversal, where are the insertions and erasures?) are insufficient to decide, and that a vector should always be the first choice. (Unless you really, really need the iterator invalidation guarantees of the list.)

If the vector turns out to be slower, here are three optimizations I can spontaneously think of:
1) Make sure the vector's element type is a POD. Then make sure your standard library optimizes the memory moves of erase to a memmove() call.
2) Optimize erasures; implement them yourself. As you iterate through the vector, move the elements back. Because you move them as far as you have deleted elements, you save yourself going through the vector multiple times.
3) Don't really erase elements. Instead, mark them as erased. Re-use the storage by searching for an erased bullet first. Because of the way aging of bullets work, keep the index of the last insertion and start your search from there; you'll get amortized constant time insertions despite the linear search for a free space, while making erasures constant-time, too.

Regarding deque: I never looked into the details of deque implementation. I don't know how it would perform in various scenarios. See rules 1, 2 and 3.

deque (usually pronounced like "deck") is an irregular acronym of double-ended queue. Double-ended queues are a kind of sequence containers. As such, their elements are ordered following a strict linear sequence.

I fully support CornedBee's reasoning as to why a vector should be preferred. I hardly ever use a std::list myself, but often use a std::vector.
I might consider a list are when the objects to go in it are very large though.

Maybe I can use vector, I realised I'll probably don't have to extend it in the beginning, only decrease the length (big ten exponent for bigints creates an awfully lot of zeroes in the beginnig). Hence I can use an offset and add it to the iterator so I can know where to start to read in the vector. Thank you.

Just curios, a container class extensible in both directions doesn't automatically mean that it has to be slower than vector in time complexity, or even time factor, or does it? I see the extensibility in the beginning as a generalization of something that has already been applied to on end in vector.

If I'm not mistaken deque is required not to move memory around when something is added to the end or beginning, a vector occasionally does is if you push_back. Hence it needs to work with chunks of the contained array that stay in the original place and therefore can't all be contiguous.

I'm afraid you are mistaken. Deque has quite horrible invalidation semantics. All insertions, no matter where, invalidate all iterators. Erasures in the middle invalidate all iterators. Only erasures at the beginning and end (pop_front and pop_back) keep iterators valid.

Just curios, a container class extensible in both directions doesn't automatically mean that it has to be slower than vector in time complexity, or even time factor, or does it?

It would be possible to implement a vector-like class that keeps empty space at both sides of the actual sequence. But reserving memory for such a container would have a tricky interface because you also need to decide where to reserve that memory. The semantics would be difficult to define.
Hmm ... I wonder, though, if a conforming deque could be implemented this way ...

*goes to check*

Hmm, nope. Worst-case constant time for end insertions required, amortized constant time not allowed.