A look at STL/CLR performance for linear containers

The performance of STL/CLR sequence containers are compared with that of corresponding BCL generic collection classes

Introduction

This article is an attempt to compare the general performance of STL/CLR's
sequence containers with the .NET generic List<T> collection class. Before I
began work on the article, I strongly believed that the STL/CLR containers would
be yards faster. To my utmost surprise, I found that this was not so and that
List<T> surpassed the STL/CLR collections with ease.

How I compared performance

I wanted to keep things simple and used the common technique
of repeating a specific operation several times. To smoothen the design, I have an
interface as follows :-

RunCode would run a specific piece of code as many
times as specified by iterations, and would return the time taken in milliseconds.
And I have the following abstract class that implements this
interface.

To profile a certain collection class, I just derive from
this
abstract class and implement RunCodeFirstOp and RunCodeSecondOp. I also have a
MeasurableSingleOp class for doing tests that do not
involve a two-step operation.

STL vector vs List<T> - basic insertion/removal

Here are the implementations of the vector specific and List<T> specific classes.

And here are my test results. As you can see, the BCL class (List<T>)
completely outperformed the STL/CLR vector class.

Iterations

STL/CLR

BCL

100000

15

3

500000

63

32

1000000

122

21

10000000

1311

299

Here's a graphical plot of how the two containers performed. Clearly, the BCL class's performance was quite superior to the STL vector's.

As you can imagine I was quite surprised by this result. Just
for the heck of it I thought I should also compare the standard STL vector with
the STL/CLR vector implementation. Note than I am still using managed code
(/clr) - the standard STL code is also compiled as /clr. Here are my surprising
results.

Iterations

STL/CLR

Standard STL

100000

11

39

500000

58

202

1000000

117

391

10000000

1161

3919

Based on that result, you should absolutely avoid compiling
native STL code using /clr. Merely porting to STL/CLR would help performance in
a big way. You might find that all you need is a namespace change (cliext
to std) and you may not have to change too much code elsewhere. And no, I
did not conclude this merely on my test results with vector, I compared the
standard list and the STL/CLR list containers with the following results.

Iterations

STL/CLR

Std list

100000

33

101

500000

63

175

1000000

274

349

10000000

2969

3663

As you can see, the difference in performance is non-trivial.
Please note that we are not comparing the native performance of STL here. We are
comparing how the native implementation when compiled under /clr compares with
the CLR implementation of STL.

Here are my results. Nothing's changed in the pattern - the
BCL class is way faster here too.

Iterations

STL/CLR

BCL

100000

33

2

500000

66

13

1000000

83

26

10000000

1061

251

And here's the graph.

The BCL equivalent of a queue is the Queue<T> class - so
just to be sure we are comparing apples with apples, I went ahead and ran tests
comparing the STL/CLR deque with the BCL Queue<T>. My results and
the corresponding graph follow.

Iterations

STL/CLR

BCL

100000

12

6

500000

49

15

1000000

89

28

10000000

1044

335

The Queue<T> class seems to be marginally slower than
List<T> but is still way faster than the STL/CLR deque container.

STL vector vs List<T> - basic iteration

This time, I wanted to test the speed with which we can
iterate over a linear collection. Here are the vector and List<T> specific iteration test
implementations.

Here are the results of my test runs. The results here are
partially contaminated by the fact that the insertion code speed differences
would also come into play. But the difference in performance is large enough to
safely ignore that for now, and again LINQ works much faster on the BCL class as
compared to the STL/CLR version.

Iterations

STL/CLR

BCL

100000

18

1

500000

44

7

1000000

79

11

10000000

842

168

And here's the graph.

STL vector vs List<T> - Linq access (take)

This is similar to the previous one except I use Take
instead of Where.

Conclusion

One of the features that was strongly marketed before STL/CLR was released was
its performance benefits over regular .NET collections. But the .NET generic
List<T> seems to be much faster. At this stage all I can think of as a valid
case for using STL/CLR would be when doing a first-level port of existing C++
code ( that heavily uses STL) to managed code.

Share

About the Author

Nish Nishant is a Software Architect/Consultant based out of Columbus, Ohio. He has over 16 years of software industry experience in various roles including Lead Software Architect, Principal Software Engineer, and Product Manager. Nish is a recipient of the annual Microsoft Visual C++ MVP Award since 2002 (14 consecutive awards as of 2015).

Nish is an industry acknowledged expert in the Microsoft technology stack. He authored C++/CLI in Action for Manning Publications in 2005, and had previously co-authored Extending MFC Applications with the .NET Framework for Addison Wesley in 2003. In addition, he has over 140 published technology articles on CodeProject.com and another 250+ blog articles on his WordPress blog. Nish is vastly experienced in team management, mentoring teams, and directing all stages of software development.

Comments and Discussions

I think that one of the main reason (apart from the fact that the debug version of native STL is painfully slow because of all checks) that native is way slower is related to the fact that memoryis garbage collected for BCL.

I suspect also that part of the performance problem with native STL compiled with STL/CLR might be related to managed/unmanaged transition. Is there a way to know when such transition occurs.

In pratice, I have found that when compiling for .NET and using the debug version of STL, it is much more performant to uses STL/CLR than native STL. I tend to believe that the checking of iterators is much faster in the managed case.

What I dislike with STL/CLR is that it is somehow harder to do function objects and we cannot uses lambda so it then become not much more advantageous for code simplicity compared to BCL when for example you want to use a custom sort.

I also think that native STL performance is affected by too much locking. Probably related to checked iterators.

I would also like if it would be easier to make code more independant on the compilation mode. But some difference between managed and unmanged code seems to make it more complex to write the same code in both cases as their are many differences.

Now with C++/CX for Metro Style applications on Windows 8, we will have still more differences to handle.

In my opinion, the fact that most of STL was rewritten for .NET is an indication that they are some flaws with the way C++/CLI language was defined. For example ^ instead of * or the fact that a ref class always need to be qualified with the ref modifier severily limit the possibilities for using the powerfulness of templates.

Thanks Philippe. This article is rather dated now I'd say. BTW the comparisons are done in /clr mode, so even the native STL is compiled to IL (except where unmanaged code is absolutely required). So that will explain some of the odd results.

This is rather disturbing though. It's not as though they're a little slower. The difference is *shocking*. I mean, there's no way i'd choose the STL/CLR collections for new code, and i'd be pretty reluctant to port native C++ to managed using them either - it's bad enough that i'd already be taking a hit due to the runtime overhead, this would just be unacceptable.

I expected that at least vector<> would be comparable. I mean, how hard is it to write a reasonably fast dynamic array? But no, it's still much, much worse. Why is there a difference? I coded up a quick implementation of a dynamic array, just to demonstrate to myself that List<> wasn't doing anything tricky behind the scenes... sure enough, i got roughly the same times as List<>. So... wtf? Does the C++/CLI compiler not do inlining? Is it really building each template class and method into real, heavy managed code classes and methods?

I re-ran the tests on my home laptop and desktop and while the speed difference is still huge, it's not as dramatic as when I ran it from my office machine. I am guessing there are a lot of other factors that come into play too. But the underlying fact remains that the BCL collections are way faster than the STL/CLR ones.

Seeing that for one particular iteration, the BCL version was 10 times faster does not really indicate that in general the BCL classes are 10x faster than STL/CLR. But the fact that every test I did showed the BCL classes to be faster (by varying degrees) can be collectively taken to mean that in general it's much safer and more performant to use the BCL classes.

In fact right now I cannot see any real world usage for STL/CLR other than academic interest. To be fair to Microsoft, I believe STL/CLR was never really completed. It was always behind schedule and I think finally they just decided to release it at the stage it was at, and I don't really expect them to spend more time on it.

It would be really nice if you could update your article to include native STL benchmark without the /clr compilation switch. I like to know how native STL fare against STL/CLR and BCL collection because I suspect native STL without /clr might be slower than the BCL collection.

OT: I am reading your C++/CLI in action the second time round. Writing any new books at the moment? It's a pleasure reading your C++/CLI book!

You should add a native code (VS2008) STL stat for reference and comparison.

I believe that wouldn't be a fair comparison Also I only wanted to test things from a managed context. The idea behind the article is to see whether using STL/CLR in .NET apps has any performance advantages over using the BCL collections.

Well I don't know about "fair" but I for one would find it interesting. And like Shog said, the performance difference between the two managed solutions seems *really* weird. What happens if you compare an STL list with a List?

¡El diablo está en mis pantalones! ¡Mire, mire!

Real Mentats use only 100% pure, unfooled around with Sapho Juice(tm)!

If compiled with /clr, the STL list shows poorer performance. It may more be due to the C++/CLI compiler generating IL from non-CLI code rather than an issue in the STL implementation as such. I didn't really look at the implementation details in detail - I was more focused on running the tests.

Ironically, my initial idea was to write an article showing off STL/CLR's superior performance. I'd have looked such an ass had I announced that prior to submitting the article