An In-Depth Study of the STL Deque Container

This article presents an in-depth analysis of std::deque and offers guidance as to when to prefer using it as opposed to std::vector, by taking into consideration memory allocation and container performance.

This article presents an in-depth look at the STL deque container. This article will discuss the benefits of the deque and under what circumstances you would use it instead of the vector. After reading this article, the reader should be able to explain the fundamental differences between vector and deque with respect to container growth, performance and memory allocation. Since deque is so similar in usage and syntax to vector, it is recommended that the reader refers to this article [^] on vector for information on implementing and using this STL container.

Deque Overview

The deque, like the vector, is also part of the Standard Template Library, the STL. The deque, or "double-ended queue", is very similar to the vector on the surface, and can act as a direct replacement in many implementations. Since it is assumed that the reader already knows how to effectively use the STL vector container, I have provided the table of deque member functions and operators below, solely for comparison and reference.

After perusing the table above and comparing it with vector, you will notice two new member functions.

push_front() - Adds elements to the front of a deque.

pop_front() - Removes elements from the front of a deque.

These are called with the same syntax as push_back() and pop_back(). Therein lies the first feature that could possibly warrant the use of deque, namely, the need to add elements to both the front and back of the sequence.

You will also notice there are two member functions implemented in vector but not in deque, and, as you will see, deque doesn't need them.

capacity() - Returns the current capacity of a vector.

reserve() - Allocates room for a specified number of elements in a vector.

Herein lies the true beginning of our study. As it turns out, there is a stark difference between vector and deque in how they manage their internal storage under the hood. The deque allocates memory in chunks as it grows, with room for a fixed number of elements in each one. However, vector allocates its memory in contiguous blocks (which isn't necessarily a bad thing). But the interesting thing about vector is that the size of the internal buffer grows increasingly larger with each allocation after the vector realizes the current one isn't big enough. The following experiment sets out to prove why deque doesn't need capacity() or reserve() for that very reason.

The objective of this experiment is to observe the differences in container growth between the vector and the deque. The results of this experiment will illustrate these differences in terms of physical memory allocation and application performance.

The test application for this experiment is designed to read text from a file and use each line as the element to push_back() onto the vector and the deque. In order to generate large numbers of insertions, the file may be read more than once. The class to handle the test is shown below:

The system performance was logged via Windows Task Manager, and the program was timed using Laurent Guinnard's CDuration class. The system performance graph is illustrated below:

Note the peaks in memory usage during vector allocation, and how the peaks grow larger as vector allocates increasing internal buffer storage. Note also that deque does not exhibit this behavior, and the buffer continues to grow linearly with element insertion. The jump in kernel time during deque deallocation as well as the shape of the curve as memory is reclaimed was an unexpected result at first. I would have expected the deallocation to look similar to vector. After looking into things further and conducting some more tests, I was able to come up with a hypothesis: since deque memory is not contiguous, it must be more difficult to hunt down and reclaim. We will put this hypothesis to the test later, but first let's analyze the performance aspects of this experiment.

Just how long do those memory allocations take?

Notice in the figure below that no elements were being added during the time vector was out finding more memory.

It is also of interest to notice how long each set of push_back() takes. This is illustrated in the figure below. Remember, each sample is 9874 strings added, with an average length of 1755.85.

Objective

The objective of this experiment is to observe the benefits of calling reserve() on a vector before a large number of elements will be added and compare these results with deque, in terms of memory allocation and performance.

Description

The test description for this experiment is the same as that of Experiment 1, except that the following code was added to the test class constructor:

m_vData.reserve(1000000);

Results

The test was performed under the following conditions:

Processor

1.8 GHz Pentium 4

Memory

1.50 GB

OS

W2K-SP4

No. of Lines in File

9874

Avg. Chars per Line

1755.85

No. of Times File Read

70

Total Elements Inserted

691180

The system performance was logged via Windows Task Manager, and the program was timed using Laurent Guinnard's CDuration class. The system performance graph is illustrated below:

It is of interest to notice that vector no longer needs to allocate more internal buffer storage. The call to reserve() takes a single step to reserve more than enough space for our test platform of 691180 elements. As for the deque deallocation hypothesis, observe the drastic growth in memory deallocation time between this test and the previous one. We will quantify this in our next experiment.

How has this improved memory allocation performance?

The following figure illustrates the number of elements added to the containers over time:

As you can see, vector is now very close to deque in performance, when adding elements to the container. However, vector tends to be slightly more sporadic in how long it takes to insert a given set of elements. This is illustrated in the figure below:

A statistical analysis of the variability in vector vs. deque, with respect to the time it takes to insert 9874 elements of 1755.85 average length, is summarized in the following tables:

Objective

The objective of this experiment is to analyze and attempt to quantify the hypothesis that deque memory is more difficult to reclaim due to its non-contiguous nature.

Description

The test class from Experiment 1 will be utilized again in this experiment. The calling function is designed to allocate test classes of increasing size and log their performance accordingly. This implementation is as follows:

Results

This experiment was performed on the same platform as the previous two experiments, except that the number of allocations was varied from 9874 to 691180 across 70 increments. The following figure illustrates the time required to reclaim deque memory as a function of the number of elements in the deque. The deque was filled with strings with an average length of 1755.85 chars.

Although the actual time varies significantly from the trendline in several instances, the trendline holds accurate with an R2=95.15%. The actual deviation of any given data point from the trendline is summarized in the following table:

deque Results

Mean

0.007089269 sec

Maximum

11.02838496 sec

Minimum

-15.25901667 sec

Std. Dev

3.3803636 sec

6-Sigma

20.2821816 sec

This is fairly significant when compared to the results of vector in the same scenario. The following figure shows deallocation times for vector under the same loading as deque above:

The data in this test holds an R2=81.12%. This could likely be improved with more iterations of each data point and averaging the runs. Nonetheless, the data is suitable to mark the point in question, and the deviation of any given data point from the trendline is summarized in the following statistical parameters:

Objective

The "claim to fame" as it were for deque is the promise of constant-time insert(). Just how does this stack up against vector::insert()? The objective of this experiment is (not surprisingly) to observe the performance characteristics of vector::insert() vs. deque::insert().

Description

There may be times when adding things to the back of a container doesn't quite suit your needs. In this case, you may want to employ insert(). This experiment also has the same form as Experiment 1, however instead of doing push_back(), the test does insert().

Results

As you can see in the following figures, the benefit of constant-time insertion offered by deque is staggering when compared against vector.

Note the difference in time-scales, as 61810 elements were added to these containers.

Objective

This experiment will test the performance of vector::at(), vector::operator[], deque::at() and deque::operator[]. It has been suggested that operator[] is faster than at() because there is no bounds checking, also it has been requested to compare vector vs. deque in this same regard.

Description

This test will insert 1000000 elements of type std::string with a length of 1024 characters into each container and measure how long it takes to access them all via at() and operator[]. The test will be performed 50 times for each scenario and the results presented as a statistical summary.

Results

Well, perhaps surprisingly, there is very little difference in performance between vector and deque in terms of accessing the elements contained in them. There is also negligible difference between operator[] and at() as well. These results are summarized below:

vector::at()

Mean

1.177088125 sec

Maximum

1.189580000 sec

Minimum

1.168340000 sec

Std. Dev

0.006495193 sec

6-Sigma

0.038971158 sec

deque::at()

Mean

1.182364375 sec

Maximum

1.226860000 sec

Minimum

1.161270000 sec

Std. Dev

0.016362148 sec

6-Sigma

0.098172888 sec

vector::operator[]

Mean

1.164221042 sec

Maximum

1.192550000 sec

Minimum

1.155690000 sec

Std. Dev

0.007698520 sec

6-Sigma

0.046191120 sec

deque::operator[]

Mean

1.181507292 sec

Maximum

1.218540000 sec

Minimum

1.162710000 sec

Std. Dev

0.010275712 sec

6-Sigma

0.061654272 sec

Conclusions

In this article, we have covered several different situations where one could possibly have a need to choose between vector and deque. Let's summarize our results and see if our conclusions are in line with the standard.

When performing a large number of push_back() calls, remember to call vector::reserve().

In Experiment 1, we studied the behavior of container growth between vector and deque. In this scenario, we saw that since deque allocates its internal storage in blocks of pre-defined size, deque can grow at a constant rate. The performance of vector in this experiment then led us to think about calling vector::reserve(). This was then the premise for Experiment 2, where we basically performed the same experiment except that we had called reserve() on our vector. This then is grounds for holding on to vector as our default choice.

If you are performing many deallocations, remember that deque takes longer to reclaim memory than vector.

In Experiment 3, we explored the differences between reclaiming the contiguous and non-contiguous memory blocks of vector and deque, respectively. The results proved that vector reclaims memory in linear proportion to the number of elements whereas deque is exponential. Also, vector is several orders of magnitude than deque in reclaiming memory. As a side note, if you are performing your calls to push_back() within a tight loop or sequence, there is a significant possibility that most of the memory deque obtains, will be contiguous. I have tested this situation for fun and have found the deallocation time to be close to vector in these cases.

If you are planning to use insert(), or have a need for pop_front(), use deque.

Well, ok, vector doesn't have pop_front(), but based on the results of Experiment 4, it might as well not have insert() either. The results of Experiment 4 speak volumes about the need for deque and why it is part of the STL as a separate container class.

For element access, vector::at() wins by a nose.

After summing up the statistics of Experiment 5, I would have to say that although all the methods were close, vector::at() is the winner. This is because of the best balance between the raw mean of the access times as well as the lowest 6-sigma value.

What's all this 6-Sigma stuff?

Although a popular buzzword in industry today, 6-Sigma actually has its roots in statistics. If you generate a Gaussian distribution (or Bell-curve) for your sampled data, you can show that at one standard deviation (the symbol for std. deviation is the Greek letter, sigma, BTW) from the mean, you will have 68.27% of the area under the curve covered. At 2 Standard Deviations, 2-Sigma, you have 95.45% of the area under the curve, at 3 Standard Deviations, you will have 99.73% of the area and so forth until you get to 6 standard deviations, when you have 99.99985% of the area (1.5 Defects per million, 6-Sigma).

Final Words

I hope you have gained some insight into deque and have found this article both interesting and enlightening. Any questions or comments are certainly welcome and any discussion on vector or deque is encouraged.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Walter Storm is currently a principal software engineer doing quantitative research for a private hedge fund. Originally from Tunkhannock, PA., he has a B.S. in Aerospace Engineering from Embry-Riddle Aeronautical University[^], and an M.S. in Systems Engineering from SMU[^]. He has been professionally developing software in some form or another since January of 2001.

Nitron, Here's a link to one of Herb Sutter's 'Guru of the week' columns on the same topic.

http://www.gotw.ca/gotw/054.htm

Herb concludes:

Consider preferring deque by default in your programs, especially when the contained type is a class or struct and not a builtin type, unless the actual performance difference in your situation is known to be important or you specifically require the contained elements to be contiguous in memory (which typically means that you intend to pass the contents to a function that expects an array).

The deque<> implementation in VC7 is much faster than the one he tested, and your data shows that the situations where vector significantly outperforms deque have become very rare.

Consider preferring deque by default in your programs, especially when the contained type is a class or struct and not a builtin type, unless the actual performance difference in your situation is known to be important or you specifically require the contained elements to be contiguous in memory (which typically means that you intend to pass the contents to a function that expects an array).

exactly. just be sure if you are puttin it into a library or passing it off to someone else, that they refrain from accessing elements through a raw pointer, as that would make for some interesting results.

Calling resize() will change the value of the vector's size(), i.e. adding possibly unwanted elements. reserve() changes only capacity(), meaning I will already have the capacity() of contiguous memory available to the vector, but no elements in it yet.

- Nitron"Those that say a task is impossible shouldn't interrupt the ones who are doing it." - Chinese Proverb

Awesome article. I'd love to see more of this sort of thing, with two cases in particular:
(a) std:map versus sorted vector (or sorted deque!).
I suspect that std:map<> is so slow it should never be used.
(b) algorithm calls versus handwritten loops.
In "Effective STL", Scott Meyers argues that the calls are faster, but he doesn't present any empirical data to support this, and I've heard many rumours that on real compilers, the algorithms are slower.

Anyway, with regard to this article, I wonder how the results would change if a better memory allocator was used. For example, "Doug Lea Malloc". The performance of vector<> might improve, but I think most of the deque<> deallocation time would disappear.

Great stuff - you've convinced me that deque is the container that should be used by default!

Don Clugston wrote:Anyway, with regard to this article, I wonder how the results would change if a better memory allocator was used. For example, "Doug Lea Malloc". The performance of vector<> might improve, but I think most of the deque<> deallocation time would disappear.

Hmm... This is interesting. I'm not sure of the interface handoff between the compiled code and the OS. I think it is up to the OS to determine what memory vector or deque gets, but I can't say for certain. The whole premise for these containers in the first place is to minimize the need for malloc or even new for that matter. In this regard, I believe performance was taken into consideration in the containers' development. I'm not sure what would happen when comparing different compilers, or even different STL implementations. Heck, even different OS platforms for that matter (keeping hardware the same, obviously).

- Nitron"Those that say a task is impossible shouldn't interrupt the ones who are doing it." - Chinese Proverb

I've had a look through the source code for vector<> in VC7.1 to try to answer this question.
It calls the default allocator, which calls new [count *sizeof(T)], which in turn just calls malloc(count *sizeof(T)). So how does malloc() work?
It gets large chunks of memory from Windows using the SDK functions HeapAlloc() and VirtualAlloc(), presumably in some multiple of 4kb. It divides these chunks up into small blocks and maintains a complex data structure of which of these little blocks are used.
When deleting, HeapCompact(), HeapDestroy(), and VirtualFree() are used. I can't imagine them taking much time because they are dealing with such large chunks.

So the behaviour we see in deque<> is a direct consequence of the bookkeeping used by malloc() as it manages its little blocks.
I think some broad conclusions are always valid, regardless of OS and compiler:
(1) When you use vector<>, you pay a very high price for the luxury of having contiguous elements. This is inherent in the design of vector, although you can usually avert most of this cost by using reserve().
(2) The deallocation cost of deque<> is dependent on the efficiency of the malloc() that you are using. If it really bothers you, you could write a custom allocator for it. boost::pool might be worth considering.

I was intrigued by the boost::pool comment, so I attempted to try it out. It seems that there are incompatibilities with boost::pool and VC6.0 and potentially VC7
I found a comment at this link:
http://lists.boost.org/MailArchives/boost-users/msg03671.php
At least not without mods. See:
http://lists.boost.org/MailArchives/boost-users/msg03672.php
and http://www.stlport.com

Don Clugston wrote:I suspect that std:map<> is so slow it should never be used.

I wouldn't go that far. std::map is implemented as a red-black tree, and if you have many insertions/deletions it is much faster than sorted vector. With std::map, the cost of insertion is logarithmic, and keeping vector sorted is expensive.

Of course, hash_map is often the best solution, but it is not (yet) a part of the Standard Library.

Fisrst: the articolis very gooood, and very well explained and illustrated.

Second: considering the theory that is behind the allocation mechanism of those collections, the resoult "should" be obvious. You demonstrate in practice how the really "ARE" or aren't. And how good (or not, and when) is the STL implementation that comes with MS C++

I do not quite understand why the memory consumption for vector presents teeth. I would have guessed each reallocation to look like a staircase step with a peak during moving of elements from the old block to the new lock, just like:

The saw pattern *may* be due to the fact that when memory is released by a virtual memory manager, it is actually returned to the system. Remember that under virtual memory, you can map different segments into a contiguous one, so the "blanks in the middle" will actually "disappear".

I think you would see your depiction if the size was smaller and the system was faster. I had the taskman on the fastest update speed and was maxing out CPU. I think it was just a system overhead phenomenon.

- Nitron"Those that say a task is impossible shouldn't interrupt the ones who are doing it." - Chinese Proverb

This is the kind of stuff I pay to read, so why are you giving it away?

Now, seriously, this is a great article.

I recently had a very interesting experience with vector versus deque when I implemented a flood fill algorithm for the Pocket PC (see article here[^]). In this algorithm I implemented an explicit recursion stack (thanks to an idea provided by Chris Losinger) and my first approach was to use a vector. What was really interesting was that, in a memory constrained device like the Pocket PC, the vector did not only show the disadvantages you depict on the article, it generated out of memory errors.

After thinking a little about this, I realized that my STL implementation was thrashing the heap due to the way vector allocates and reallocates itself. After a little research, I saw (on Jusuttis' book) a proposed stack implementation using a deque. The author used essentially the same arguments as you (except for the memory reclaiming) to support the deque. After changing my code to use a deque, I noticed two things: no more out of memory errors and an improved performance. So, in essence, using a deque instead of a vector was the difference between a robust implementation and a mere academic one.