Introduction

What data structure should you use? What data structure should you avoid?

Imagine that you have to use a data structure. It is nothing fancy,
only a storage for raw numbers, or POD data carriers that you will use now and then and some later on.

These items will be inserted, then removed intermittently. They will have
some internal order, or significance, to them and be arranged accordingly.
Sometimes insertions and removal will be at the beginning or the end but
most of the times you will have to find the location and then do an insertion
or removal somewhere in between. An absolute requirement is that
the insertion and removal of an element is efficient, hopefully even O(1).

Now, what data structure should you use?

On-line resources

At a time like this it is good advice to double check with books and/or on-line
resources to verify that your initial hunch was correct. In this situation maybe
you recollect that vectors are generally fast for accessing an item but can
be slow in modifying operators since items not at the end that are inserted/removed
will cause part of the contents in the array to be shifted. Of course you also
know that an
insertion when the vector is full will trigger a re-size. A new array storage
will be created and all the items in the vector will be copied or moved to the
new storage. This seemsintuitivelyslow.

A linked list on the other hand might have slow O(n) access to the item but the
insertion/removal is basically just switching of pointers so this O(1) operations
are very much appealing. I double check with a few different online resources
and make my decision. Linked-list it is…

BEEEEEEEEP. ERROR. A RED GNOME JUMPS DOWN IN FRONT OF ME AND FLAGS ME DOWN.
HE
TELLS ME IN NO UNCERTAIN WAYS THAT I AM WRONG. DEAD WRONG.

Wrong?How can this be wrong? This is what
the online resources say:

www.cplusplus.com/reference/stl/listRegarding std::list[...]
Compared to other base standard sequence containers (vector
and deque), lists perform generally
better in inserting, extracting and moving elements in any position within the
container, and therefore also in algorithms that make intensive use of these,
like sorting algorithms.

www.cplusplus.com/reference/stl/vectorRegarding std::vector[...] vectors are generally the most efficient
in time for accessing elements and to add or remove elements from the end of
the sequence. [...] For operations that involve inserting or removing elements
at positions
other than the end, they perform worse than
deques and
lists.

http://en.wikipedia.org/wiki/Sequence_container_(C%2B%2B)#List
Vectors are inefficient at removing or inserting elements
other than at the end. Such operations have O(n) (see
Big-O notation) complexity compared with O(1) for linked-lists. This is
offset by the
speed of access — access to a random element in a vector is of complexity O(1)
compared with O(n) for general linked-lists and O(log n) for link-trees.

In this example most insertions/removals will definitely not be at the end, this
is already established. So that should mean that the linked-list
would be more effective than the vector, right? I decide to double check with
Wikipedia,
Wiki.Answers and
Wikibooks on Algorithms. They all seem to agree and I cannot understand what
the RED GNOME is complaining about.

I take a break to watch
Bjarne Stroustrup’s Going Native 2012 keynote. At time 01:44 in the video-cast
and slide 43, something interesting happens. I recognize what the RED GNOME
has tried to to tell me. It all falls into place. Of course. I should have known.
Duh. Locality of Reference.

Important Stuff comes now

It does not matter that the linked-list is faster than vector for inserting/removing
an element. In the slightly larger scope that is completely irrelevant.
Before the element can be inserted/removed the right location must be
found.
And finding that location is extremely slow compared to a vector. In fact, if
both linear search is done for vector and for linked-list, then the linear search
for vector completely, utterly and with no-contest beats the list.

The extra shuffling, copying overhead on the vector is time-wise cheap.
It is dirt cheap and can be completely ignored if compared to the huge overhead
for traversing a linked-list.

Why? you may ask? Why is the linear search so
extremely efficient for vector compared to the supposedly oh-so-slow linked-list
linear search? Is not O(n) for linked-list comparable to O(n) for
vector?

In a perfect world perhaps, but in reality it is NOT! It is
here that
Wiki.Answers
and the other online resources are wrong! as the advice from them seem to suggest
to use the linked-list whenever non-end insertion/removals are common. This is
bad advice as they completely seem to disregard the impact of
locality
of reference.

Locality of Reference I

The linked-list have items in disjoint areas of memory. To traverse
the list the cache lines cannot be utilized effectively. One could say that the
linked-list is cache line hostile,
or that the linked-list maximizes cache line misses. The disjoint memory will make
traversing of the linked-list slow because RAM fetching will be used extensively.

A vector on the other hand is all about having data in adjacent memory. An insertion
or removal of an item might mean that data must be shuffled, but this is cheap for
the vector. Dirt cheap (yes I like that term).
The vector, with its adjacent memory layout maximizes cache
utilization and minimizes cache line misses. This makes the whole difference,
and that difference is huge as I will show you soon.

Let us test this by doing insertions of random integer values. We will keep the
data structure sorted and to get to the right position we will use linear search
for both vector and the linked-list. Of course this is silly to do for
the vector but I want to show how effective the adjacent memory vector is compared
to the disjoint memory linked-list.

The time tracking (StopWatch
that I wrote about
here) is easy with C++11... Now all that is needed is the creation of the
random values and keeping track of the measured time for a few sets of random numbers.
We measure this from short sets of 10 numbers all the way up to 500,000. This will
give a nice perspective

So 500,000 sorted insertions in a linked-list took some 1 hour and 47 minutes.
The same number of elements for the vector takes 1 minute and 18 seconds. You can
test this yourself. The code is attached with this article and available at the
pastebin and online compiler IdeOne: http://ideone.com/pi2Od

I thought it would be interesting to compare my quad-core against
a simpler and older computer. Using the
CPU-Z freeware
software I got from my old dual-core Windows7 PC that it has 2xL1 caches a 32KBytes
and 1xL2 cache a 2048 KBytes. Testing both with a smaller sets of values shows
the cache significance clearly (disregarding different bus bandwidth and CPU
Speed )

x86 old dual-core PC

x64 quad-core Intel i5

Devil's Advocate I

The example above is of course somewhat extreme. This integer example brings
out another bad quality of linked-list. The linked-list need 3 * size of integer
to represent one item when the vector only need 1. So for small data types it makes
even less sense to use a linked-list. For
larger types you would see the same behavior as shown above but obviously the times
would differ. With larger types that are more expensive to copy the vector
would loose a little performance,. but likely outperform the linked-list similar
to what I described above.

Another objection to the testing as shown above is that they
are of
too large sets for a linked-list. The point is even for smaller numbers the vector
will generally perform almost as good or better than the linked-list. Then in time
the code will change and as your container is expanded it will
fare better if it is a vector than if it is a linked-list.

Wikipedia being the great source of information of it is I should also point
out that it of course also
compares linked-list to vector (i.e. dynamic array). You can read of it
here for more detailed pro’s and cons.

Devil's Advocate II : Algorithms with merge, sort

Yes, what about it? The online resources pointed out earlier are stating that
merge rand sort of the containers is where the linked-list will outperform the vector.
I am sorry to say but this is not the case for a computer with a modern cache architecture.
Once again these resources are giving you bad advice.

From a mathematical perspective YES it does makes sense when calculating big
O complexity. The problem is that the mathematical model (at least the ones I have
seen) are flawed. Computer cache, RAM, memory architecture are completely disregarded
and only the mathematical, SIMPLE, complexity is regarded.

The sort testing I made is available on the
google spreadsheet on tab “sort comparison“. I used std::sort
(vector) vs std::list::sort. This might be a leap of faith since this assume that
both sortings are at their best and that is where the linked-list (supposedly) was
the winner (not).

The std::sort (vector) beats the std::list::sort hands-down. The complete code is
attached and available here http://ideone.com/tLUeK
.

The testing above for sort was not repeated for merge,.
but for merge of two sorted structures, that after the merge should still be sorted,
then that involves traversal again and once again the linked-list is no good.

What to do now?

“We should forget about small
efficiencies, say about 97% of the time: premature optimization is the root of all
evil” .

Meaning that “Premature Optimization” is when the software developer
thinks he is doing performance improvements to the code without knowing for a fact
that it is needed or that it will greatly improve performance. The changes made
leads to a design that is not as clean as before, incorrect or hard to read. The
code is overly complicated or obfuscated. This is clearly an anti-pattern.

On the other hand we have the inverse of the extreme Premature Optimization
anti-pattern, this is called Premature Pessimization.
Premature pessimization is when the programmer chose techniques that are known
to have worse performance than other set of techniques. Usually this is done out
of
habit or just because.

For incrementing native types pre- or post-increment is of little difference
but for class types the performance difference can have an impact. So using value++
always is a bad habit that when used for another set of types actually is degrading
performance. For this reason
it is always better to just do pre increments. The code is just as clear
as post increments and performance will always be better (albeit
usually only a little better).

Using linked-list in a scenario when it has little performance benefit (say
10 elements only touched rarely) is still worse compared to using the vector.
It is a bad habit that you should just stay away from.

Why use a bad performance container when other options are available? Options
that are just as clean and easy to work with and have no impact on code clarity.

Stay away from Premature Optimization but be sure to not fall
into the trap of Premature Pessimization! You should follow the best (optimal)
solution, and, of course, common sense
without
trying to optimize and obfuscate the code and design. Sub-optimal techniques such
as linked-list should be avoided since there are clearly better alternatives out
there.

C++ only?

Of course this could be true for other languages and containers as well. I leave
it as a fun test for whomever feel challenged to test for for C#
(List<T> vs LinkedList<T>) .... [ Java, Python
...?]

Disclaimer & Conclusion

Contrary to what you might believe when reading this I am no fanatical no-linked-listevertyperadical on the subject. In fact I can come up with
examples where linked-list would fit in nicely.

However these examples are rare - maybe you can point some out for me?

Until then, why do not we all just stay away from the linked-list
for the time being? After all C++ is all about performance - and linked-list is
not!

Last, a little plea. Please, before I get flamed to pieces for this obstinate
article, why not run some sample code on ideone.com or similar, prove me wrong and
educate me. I live to learn =)

Oh, and yes. I know, thank you for
pointing it out. There are other containers out there, but this time I focused on
Linked-list vs Vector

Share

About the Author

Kjell is a driven developer with a passion for software development. His focus is on coding, boosting software development, improving development efficiency and turning bad projects into good projects.

Kjell was twice national champion in WUKO and semi-contact Karate and now he focuses his kime towards software engineering, everyday challenges and moose hunting while not being (almost constantly) amazed by his daughter and sons.

I vote 5 for the well written and explained article, as well as for the cool talk of Bjarne.

However, I want to emphasize, that improving performance of the system over a particular user scenario is what matter.

I learned that, given a specific slowdown, it exists one and only one ressource bottleneck at a given time until fixed.And for most development, CPU is NOT the bottleneck, so micro optimizing list over array is a waste of energy. (It's what you want us to do given your title)Energy is best spent on the bottleneck, where 1H can give you 90% performance improvement which might be enough for the user.

I worked on resource constrained device like fez spider (http://www.ghielectronics.com/catalog/product/297) or even worse netduino (http://www.netduino.com/)- 100 ko of RAM.On these devices the bottleneck was network, and, for the netduino, RAM memory.

[modified: Ooops. Sorry, I misspelled your name. I have too many Nicholas in my family so the spelling came naturally "wrong". Fixed now]

Thank you Nicolas. Nice to hear you thought it was well written.

Nicolas Dorier wrote:

[...] so micro optimizing list over array is a waste of energy.

This is what I mean with my discussion in the end about "Premature optimization" vs "Premature Pessimization". For pure number crunching and simple POD (not large ones) there is rarely any reason to use the linked-list. If not using the vector then other containers are still better. Avoiding premature pessimization is a good thing - I do not see how it is a big effort in this case to do so.

Nicolas Dorier wrote:

[...] for most development, CPU is NOT the bottleneck, so micro optimizing list over array is a waste of energy.

Very good observation Nicolas!

We are all shaped from our experiences, right? This was exactly the bottleneck the first time I encountered this. The second time I encountered this it was part of the bottleneck. Why? Well frequent exercising of the linked-list gave lots-and-lots of cache misses. In effect it will be the same thing as encountering a bad case of False Sharing.

False Sharing happens all the time, it is when it happens at the wrong time, and too much of it then it becomes a problem.

The first time we got a speed-up of factor 35 (RTOS) the second time the speed-up was just enough to avoid getting jitter in the graph (life sciences, GPOS)

I had to rate this article as poor. Most of the information you supplied is accurate and worth knowing, but the over the top hyperbole, even in the title, is simply wrong and can lead especially new programmers to reach the wrong conclusions. As you even admit in the comments here, linked lists are a better, more performant solution in some cases. Bjarne never said otherwise, and in fact in the Q&A after this talk he made that quite clear. His advice is to start with/prefer vector. If you find the vector usage to be a hotspot through profiling and test and prove that list does better, then switch. The article read like you listened to Bjarne's speach and immediately wrote this article without understanding what he really was saying. If that wasn't the case, then you used the hyperbole to get attention, which is maybe worse. Either way, if you'd removed the hyperbole I could have rated this article higher, but as it is I was tempted to give it a 1. I'll give it a two only because there is some information here worth understanding.

<layer><layer>you used the hyperbole to get attention, which is maybe worse

I have changed the title as I partly agree with what you say. I have also changed the initial, explaining scope (introduction's first few lines). I am also in process of updating the article to clearly state the scope, the sub-set, of when this is "true".

I don't know what is worse that I used (initially) a partly misleading title or that all the online resources that I found are too simplistic in their explanation and simply put linked-list as the choice? Personally I think the "hyperbole" is worth it. Let's face it, how many people will read my article compared to how many people will read the simplifying-falsifying online resources I pointed out? At least if they read my article they will also read that I do not say it is this way for all situations. I even encourage the readers to come up with counter example,. which several already has done, if you would bother to read the comments, and my responds to those comments before you down-vote me.

The damage to beginner programmers is probably worse off from the impact of those resources then my article.

But fighting "fire with fire" might be a bad choice parhaps...

Two was harsh. I understand your point, but think it was harsh. I hope you will reconsider after the small, but clarifying changes I have made. In a week I will have cleared the fog completely I hope

I don't think two was harsh (or I wouldn't have voted it). I dislike hyperbole, though that wouldn't warrant a two on it's own. However, when that hyperbole is flat out wrong (never, NEVER, really?) it's time for a down vote, IMHO. Worse, many readers would imply that the hyperbolic stance was attributed to Bjarne Stroustrup here based solely on the wording in your article, and the reality is that Bjarne actually said the exact opposite in the linked video.

That said, yes, if you remove the hyperbole I'm likely to change my vote. After all, the real message behind what Bjarne said is well worth spreading.

How is this wrong? Read the bold. It states very clearly what it is about.

KjellKod.cc wrote:

Number crunching: Why you should never, ever, EVER use linked-list in your code again

IntroductionWhat data structure should you use? What data structure should you avoid? Imagine that you have to use a data structure. It is nothing fancy, only a storage for raw numbers, or POD data carriers that you will use now and then and some later on.

But fine. Do you have an alternative catchy title then? I am not impossible here, although I think I adjusted it fine without removing it's edge after Luc's input.

The title isn't qualified. It said "Why you should never, ever, EVER use linked-list in your code again." You're even shouting it, there. And the truth is, there are times you should use a linked list. Focusing just on your bolded bits, though, it's still not accurate. Usage patterns are what matter in the choice here, not what kind of data you're storing. You're right that the simplistic rules we all learned in school about the usage patterns of random access vs. frequent insertions doesn't hold true, but there are still usage patterns for which one is more appropriate than the other.

Making a performance argument while using STL is absurd. In all seriousness, if high performance is required, you will almost always be better off writing your own custom collection.

Almost any template-based linked list is particularly horrible.

Rather than ramble dogmatically, it would have been better to explore the impact of the CPU cache, paging and so forth AND various algorithms to leverage this, almost none of which will involve STL (except, perhaps, at the periphery.)

Instead, you manufactured a data scenario to fit your preconceived notion that linked lists are bad, vectors good and then threw out cliches to support your position, including quoting Knuth.

(And, BTW, memory fragmentation and allocation exceptions are two entirely different things.)

Yes, this article is a little silly and it does simplify things, maybe too much?

I do not think linked-list are bad, far from it. I certainly state this several times in the article.

The purpose of the article is to make coders aware that disjoint memory structures might come at a price compared to if using a contiguous memory structure. To show this it was the easiest to use std::list and vector in C++ and in Java LinkedList and ArrayList since these structures are widely used and recognized.

A few common scenarios were picked, even scenarios where many coders commonly think that the linked-list would outperform the contiguous dynamic array structure.

Judging from the comments the difference in performance surprised many readers so from their reaction I am happy that I could educate them.

It was not the purpose of this article to show what data structures and algorithm were best suited for different tasks, only to make coders aware of cache efficient vs non cache efficient data structures.

It is a great suggestion to showcase various algorithms (with different data structures) and how they can have an impact. Maybe this will be material for another article.

I don't think it's silly; I think it misses the point. If you want to talk about the performance implications of the CPU caches and demand paging, do so, but one you introduce STL or any generic algorithm, you implicitly state that performance is secondary. This is compounded by creating test conditions which favor vectors. Even assuming you are willing to trade performance for writing code faster, the problem still dictates the solution.

(Incidentally, I've never seen anyone use std::list. For me personally, the reason is simple; it's a terrible class. On the other hand, terrible apparently isn't everything since I recently saw someone use a dictionary where a static array would have sufficed.)

Finally:

KjellKod.cc wrote:

I do not think linked-list are bad, far from it

"Why you should never, ever, EVER use linked-list in your code again."

Well, you both are guilty of vast oversimplifications. Original author at least explicity stated that did exactly that, to make a point and provoke readers to think. By using STL you do not in any way state that peformance is secondary, this is false dichotomy. There are multiple ways to get performance, and all those have costs. So, STL is quite often a good way to get enough performance at acceptable cost. Custom data structures are costly in development and maintenance, and quite often there are other much cheaper way for performance enhancement. After all an additional server box costs rather less than single engineer's man month.

BTW. I saw std::list used, and lo and behold it made perfect sense. One thing, there were random deletions but without linear search. Second thing, stored objects are not very small. Third, those stored objects do not have pointers to their neighbors, so std::list's own pointers are not superfluous.

My primary issue with all those discussions about vector vs list is that neither is a good choice if you're to do linear searches and insertions (unless there are some special conditions, like trivial amount of data).

What I didn't see in the article was the initial conditions in the test. If you know in advance the size of the array needed to contain the data, then certainly I can pre-allocate the array and manipulate it more quickly than a linked list. But if you don't know the amount of data you will be working with in advance (almost universally the case) then you will be forced to either:

1) Over-allocate the array in the hopes of being right, or2) Reallocate and rebuild the array when it fills, leading to fragmented storage.

I must admit that my need to "number crunch" (sum?, average?, mean?, median?, mode?, linear regression?) has been limited in my career. More often than not I'm relating a table to other data in storage to spare me the overhead of SQL. That, typically, involves searching possibly large amounts of entries - a task that a linked list is poor at.

One technique that I have used with reasonable efficiency to handle significant amounts of data arriving in random sequence, of unknown quantity is to accumulate the data in a linked list, then construct a sorted list of pointers to the entries once the size of the list has been determined. It saves me the problem of overallocation, reallocation and fragmentation, while allowing me to search and manipulate the data with the efficiency of an array.

Quick replies: 1) The arrays are never preallocated in the testing above. So all the overhead for reallocating, copying over contents when the array is increasing in size is seen in the graphs.

2) std::vector does this (allocate, copy, etc) *automatically* so your 1+2 are not so important to worry about, unless you are on a system where memory is very expensive (embedded) these are non-issues.

3) Fragmentet storage? Eh? That is not how std::vector, ArrayList or similar structures work. Are you thinking of some specific or handmade datastructure?

Robert Ludwig wrote:

One technique that I have used with reasonable efficiency to handle significant amounts of data arriving in random sequence, of unknown quantity is to accumulate the data in a linked list, then construct a sorted list of pointers to the entries once the size of the list has been determined. It saves me the problem of overallocation, reallocation and fragmentation, while allowing me to search and manipulate the data with the efficiency of an array.

I am glad to hear that it worked out great for you. Linked-List is of course a structure you can use exactly for this. I am not saying it is more efficient one-or-the-other : it all depends on the "search and manipulate data" content.

* This fragmentation issue sounds very strange to me. <Would you care to elaborate?* Search is normally faster on a contiguious memory : list is definitely slower* Sort is normally faster on contigous memory: list is normally slower - algorithm for this is of course a factor.

* Manipulation of data *can* be faster on linked-list. It all depends on what type of manipulation. The search time however is likely to remove any chance of linked-list being faster then the contiguous data structure. ---- Please note that I do not write array or vector : It could well be that a binary heap (or something else) is more of what would be suitable. ---- If it is time critical your operations I would suggest you to try out what difference it makes using contigous memory structures instead of disjoint memory structure like the linked-list

Thank you for your clarification on the management of the storage techniques. It does make for a fairer comparison. I agree that the overhead of allocating individual nodes for each entry will be greater than allocating a block of nodes and manipulating the storage.

As for the fragmentation consider that, even in a virtual memory environment, storage is allocated sequentially logically if not physically. Each time the array / vector has to grow, the system / program must first allocate a new array, copy the old one to the new one, then free the old one. Regardless as to how you size the new allocation, you will eventually end up with a dead zone of storage which, while both free and large, is not a large enough contiguous space to contain the new array. No matter how much memory you have on your system, given a sufficiently large enough collection of data, the array will fail long before the linked list does.

Good. Then I know what you meant. The "fragmented storage" threw me off since I did not know if you meant RAM or fragmented within the data-structure.

OK. Before I address that. The overhead of memory allocations is a common *cost* that usually can just be ignored. Another reader asked me to remove the allocation time from linked-list as he was sure this would show a different picture. It did not. Not surprising since it is the frequent cache misses and cache lading from RAM that goes with traversal of disjoint memory that is so much more time consuming.

You can read his question here. And you can read my answer here [^]. As you see if you compare the timing, there is almost no difference in time between linked-list [LL] with/without allocations. So rest assured, performance issues from allocation overhead must be extremely rare.

Robert Ludwig wrote:

Regardless as to how you size the new allocation, you will eventually end up with a dead zone of storage which, while both free and large, is not a large enough contiguous space to contain the new array. No matter how much memory you have on your system, given a sufficiently large enough collection of data, the array will fail long before the linked list does.

True that! Depending on your system and application this could be a real issue. But unless on very limited OS and embedded systems it is very unlikely, more of an academic issue then a real issue. At least from what I have understood when I tested this (hours and hours without managing to get bad alloc due to fragmentation).

If it is an issue or the structures are so humongous that it could be an issue (on x86-x64 we're probably talking many millions of integers) then you can use the "unrolled list" or std::deque : basically a list of dynamic arrays. Almost all of the cache benefit from the dynamic array and minimum risk for the fragmentation *danger*.

I agree with your main conclusion that the modelling of the algorithms doesn't take into account the complexities of modern hardware. What is really interesting is that there are a whole bunch of modern engineers who would recognise this problem - people who write 3D rendering engines. Cache misses, when you are talking about polygon vertices on hundreds of models can have a huge impact. There are a number of tools, within this specialism designed because of the lessons learnt.Obviously all your caveats are fine and I do also like your obstinate writing style for this article.

I tried to be consistent that when mentioning specific library implementations like std::vector or std::list which then followed the CodeProject tags. I did not follow the code tagging when the article described vector or linked-list in general terms.

I am sure I messed up this along the road on more then a few places. Maybe as you commented it is better to follow the code highlighting for every time list, dynamic array or vector is mentioned?

Thank you @.dan.g. I will make sure to keep this in mind when/if I update the article.

You measure cache access but yet you tend to forget the fact, that std::vector preallocates a *lot* of memory! This perhaps is compiler dependent, for Microsoft it is 50% of the vector size - that'a a lot of overhead. Now for test to be fair you should exclude memory allocation from the equation, as it is an expensive operation. Simplest way to do so is to preallocate all list elements in a second list, then simply pull noedes from it and insert into "main" list. I am more than certain, that timing will be *completly* different then! When memory pool is empty you should allocate another 50% elements, but this timing should be excluded from overall performance.

Another approach would be to preallocate a list *and* the vector thus eliminating memory allocation per se from the measurement.

Yet I agree, that in perfect conditions linear iteration of an array would be a fraction faster, since "add reg,const" is faster than "mov reg,[reg]", but only just.

You are the third person who comments on this or related topics so it definitely merits a place in the article. I will put it immediately in my TODO list[^], for if/when this article is updated.

Your comment above is from a slightly different point of view. It focuses not on linked-list in pre-allocated *contiguous* space but on the overhead of individual memory allocations vs the block-sized allocations done for vector. This is an important issue as memory allocation is often brought up to be a costly factor.

In the meantime let me break down your comment in a couple of sub-comments and try to answer them.

waleri wrote:

[...] for test to be fair you should exclude memory allocation from the equation, as it is an expensive operation

System, OS, even hardware makes allocation cost vary from one machine to another. It is interesting to find out how expensive memory allocation is. Especially in comparison to the cost for traversing the linked-list (disjoint memory) vs traversing vector (adjacent memory)

I followed your advice to use a pre-allocated list as a memory pool, grabbing the items as needed and inserting them into the *main list*. This way the time for node memory allocation is removed from the timing.

To get an appreciation of how this is compared to earlier tests. The same random values for insertion are also used for a non pre-allocated linked-list and a std::vector. The code is available here http://ideone.com/4hetC[^]

Pre-allocated linked-list vs linked-list vs vector. Timing is made in microseconds (us). Testing was done on ideone.com, ubuntu x64 quad-core and windows7 quad-core computer. See conclusion in the end

ConclusionsGraphs of the result is available at agoogle doc[^]. The linked-list vs the pre-allocated linked list are both in disjoint memory. The pre-allocated does not *suffer* from the extra overhead of memory allocations when inserting a new node, however using the memory pool causes one extra pointer indirection traversal when using splice for node insertion.

Disjoint memory access is more expensive then allocations : ergo any potential gains for the avoided memory allocation is thereby lost.

In comparison the traversal cost for visiting node-to-node in disjoint memory is overwhelming compared to any potential gains (or non-gains) for using the memory pool.

However, a memory pool in *adjacent* memory will give better result since at least some nodes can benefit from that. Cache misses will be less. This is not shown here but shown in Bjarne Sutters keynote speech (see above). This potential gain is also mititaged as the linked-list will eventually become fragmented[^].

waleri wrote:

linear iteration of an array would be a fraction faster, since "add reg,const" is faster than "mov reg,[reg]", but only just.

I hope this article and my answer above has shown that the difference is not fractions faster but in fact the difference is colossal . What matters is how often RAM access is needed for traversal. Disjoint memory access, with constant cache misses, overshadows any possible gains with avoiding individual memory allocations.

This reply took some time @waleri. Thank you for bringing it up. When/if updating the article I will make sure to add your contribution to give extra clarity to how cost for memory allocations compare to memory traversal.