Introduction

Strings are so heavily used in all programming languages that we do not think about them very much. We use them simply and hope to do the right thing. Normally all goes well but sometimes we need more performance so we switch to StringBuilder which is more efficient because it does contain a mutable string buffer. .NET Strings are immutable which is the reason why a new string object is created every time we alter it (insert, append, remove, etc.).

That sounds reasonable, so why do we still use the .NET String class functions and not the faster StringBuilder? Because optimal performance is a tricky thing and the first rule of the performance club is to measure it for yourself. Do not believe somebody telling you (including me!) that this or that is faster in every case. It is very difficult to predict the performance of some code in advance because you have to know so many variables that influence the outcome. Looking at the generated MSIL code does still NOT tell you how fast the code will perform. If you want to see why your function is so slow/fast you have to look at the compiled (JIT ed) x86 assembler code to get the full picture.

Greg Young did some very nice posts about what the JITer does make of your MSIL code at your CPU. In the following article I will show you the numbers for StringBuilder vs String which I did measure with .NET 2.0 a P4 3.0 GHz with 1 GB RAM. Every test was performed 5 million times to get a stable value.

Insert a String / Remove a character from one

I inserted the missing words at the beginning of the the sentence "The quick brown fox jumps over the lazy dog" to find out the break even point between String.Insert and StringBuilder.Insert. To see how the removal of characters worked I removed in a for loop one character from the beginning of our test sentence. The results are shown in the diagram below.

We see here that StringBuilder is clearly the better choice if we have to alter the string. Insert and Remove operations are nearly always faster with StringBulder. The removal of characters is especially fast with StringBuilder where we gain nearly a factor of two.

Replace one String with another String

Things do become more interesting when we do replace anywhere from one to five words of our fox test sentence.

This is somewhat surprising. StringBuilder does not beat String.Replace even if we do many replaces. There seems to be a constant overhead of about 1s we see in our data that we pay if we use StringBuilder. The overhead is quite significant (30%) when we have only a few String.Replaces to do.

String.Format

I checked when StringBuilder.AppendFormat is better than String.Format, and also appended it with the "+" operator.

And the winner for String Concatenation is ... Not string builder but String.Join? After taking a deep look with Reflector I found that String.Join has the most efficient algorithm implemented which allocates in the first pass the final buffer size and then memcopy each string into the just allocated buffer. This is simply unbeatable. StringBuilder does become better above 7 strings compared to the + operator but this is not really code one would see very often.

Comparing Strings

An often underestimated topic is string comparisons. To compare Unicode strings your current locale settings has to be taken into account. Unicode characters with values greater than 65535 do not fit into the .NET Char type which is 16-bit wide. Especially in Asian countries these characters are quite common which complicates the matter even more (case invariant comparisons). The language specialties honoring comparison function of .NET 2.0 (I guess this is true for .NET 1.x also) is implemented in native code which does cost you a managed to unmanaged, and back transition.

It is good that we compared the string comparison functions. A factor of 3 is really impressive and shows that localization comes with a cost which is not always negligible. Even the innocent looking mode StringComparison.InvariantCulture goes into the same slow native function which explains this big difference. When strings are interned, the comparison operation is much faster (over a factor 30) because a check for reference equality is made by the CLR.

To tell the truth, I was surprised by this result also and I did not know for a long time th use of this strange CompareOrdinal function. String.CompareOrdinal does nothing else than to compare the string char (16-bit remember) by char which is done 100% in managed code. That does allow the JITer to play with its optimizing muscles as you can see. If somebody does ask you what this CompareOrdinal is good for you now know why. You can (should) use this function on strings that are not visible to the outside world (users) and are therefore never localized. Only then it is safe to use this function. Remember: Making a program working fast but incorrect is easy. But making it work correctly and operate quickly is a hard thing to do. When you mainly deal with UI code the it's a good bet that you should forget this function very fast.

Conclusions

The following recommendations are valid for our small test strings (~30 chars) but should be applicable to bigger strings (100-500) as well (measure for yourself!). I have seen many synthetic performance measurements that demonstrate the power of StringBuilder with strings that are 10KB and bigger. This is the 1% case in real world programs. Most strings will be significantly shorter. When you optimize a function and you can "feel" the construction costs of an additional object then you have to look very carefully if you can afford the additional initialization costs of StringBuilder.

The shiny performance saving StringBuilder does not help in all cases and is, in some cases, slower than other functions. When you want to have good string concatenation performance I recommend strongly that you use String.Join which does an incredible job.

Points of Interest

I did not tell you more about the String.Intern function. You need to know more about string interning only if you need to save memory in favor of processing power.

If you want to see a good example how you can improve string formatting 14 times for fixed length strings have a look at my blog.

Did you notice that there is no String.Reverse in .NET? In any case, you would rarely need that function anyway Greg did put up a little contest to find the fastest String.Reverse function. The functions presented there are fast but do not work correct with surrogate (chars with a value > 65535) Unicode characters. Making it fast and correct is not easy).

The test results obtained here are .NET Framework, machine and string length specific. Please do not simply look at the numbers and use this or that function without being certain that the results obtained here are applicable to your concrete problem.

History

28.7.2006 Fixed Download/Fine tuning the coloring of the charts to make it more readable.

Share

About the Author

He is working for a multi national company which is a hard and software vendor of medical equipment. Currently he is located in Germany and enjoys living in general. During his search for programming best practices he was awarded by Microsoft with the Patterns and Pratices Champion Award. Although he finds pretty much everything interesting he pays special attention to .NET software development, software architecture and nuclear physics. To complete the picture he likes hiking in the mountains and collecting crystals.

The "String Concatenation" paragraph might be a bit misleading to a novice programmer. A StringBuilder is meant to improve performance of multiple subsequent appends, not to make a single Join operation.

Using a plain Join only makes sense if you have a fixed array of strings available. If not, you either need to instantiate a List or an Array and then pass it to String.Join, or use a StringBuilder, which is slightly simpler.

As this article shows it always depends on your specific problem you are trying to solve which approach is faster. I can only assume what your exact situation was but I could imagine that you did combine all strings with string.Join instead of StringBuilder but the exact number of strings was not known in advance so you did have a temp string which was thrown away after the next string.Join. That would lead to excessive garbage collections and poor performance.

Now that we are in the "brave new world" of Linq, anonymous methods, and lambdas.

I would love to see a comparison of timings when the the input was a long (4k or more) list of words that varied in length from short to very long).

thanks, Bill

"Many : not conversant with mathematical studies, imagine that because it [the Analytical Engine] is to give results in numerical notation, its processes must consequently be arithmetical, numerical, rather than algebraical and analytical. This is an error. The engine can arrange and combine numerical quantities as if they were letters or any other general symbols; and it fact it might bring out its results in algebraical notation, were provisions made accordingly." Ada, Countess Lovelace, 1844

Yes you call String.Concat but different overloads. Lets have a look at the code:

string Add(params string[] strings) // Used Test functions for this chart
{
string ret = String.Empty;
foreach (string str in strings)
ret += str; // slow every time a new string is created
return ret;
}
string Concat(params string[] strings)
{
returnString.Concat(strings); // faster since for many strings it can do the concatenation at once.
}

If you look at the code you will notice that String.Concat is called with an array of strings which means it can concat all strings at once whereas the + operator in my case has to create a new string object for every concatenation. If you dig deeper you will find that the Concat version which takes an array actually calls:

Data structures are a tricky business and you always can find another way to use some of the overloads of String/StringBuilder with a different performance characteristics. But hey that is the reason why these overloads exist

I'm not sure you tested string.Format and StringBuilder's AppendFormat correctly. You simply tried formatting the same string with 2 arguments many, many times. Instead, shouldn't you have varied the number of arguments being passed in each time? I found that StringBuilder was faster with 3 arguments, but string.Format was faster with 1, 2, 5, 10, and 20 arguments being formatted... On average, StringBuilder was 10% slower. Also, the way you have your code, it is going to be slower the more you add because you keep doing + on the string versus calling Append on the StringBuilder class. This is really testing Concat speed, not Formatting:

If you need to ask which string overload is faster I would recomment to get first rid of all XmlNodes in your project. These things are really slow. Then I would get some performance profiler like Ants from Redgate or profile some key functions at your own:

Then you need to take a deep breath and check you data structures, who calls when what and if it is necessary at this time or if some results can be cached for later reuse. From experience I can tell you that you are most likely micro optimizing at the wrong end.

One note. You say you can (should) use CompareOrdinal for performance. I would go further to say there are times when you must use it. That is, when you want the same results irrespective of regional settings.

For example, if some strings were sorted and written to a file and later a binary search was performed using the file you want the string to be compared the same way for both the sort and any later search even if the file is shipped to the other side of the world. If not then the search may fail when it shouldn't. (Using CompareOrdinal would make the sorting and searching faster too.)

Also, I agree that string comparisons for the current locale should be done if the result is visible to the user. However, it is not whether the strings are seen by the user but whether the results of the comparisons are visible to the user. If you are sorting a list of strings then presenting them to the user then you should sort using the CurrentCulture option, but if the sorting is for some internal reason then you should use Ordinal option.

Dunno if anyone has mentioned this but based on my tests StringBuilder.Append is faster than String.Join when you specify the capacity for the StringBuilder. I think when you use StringBuilder this way it is similar to String.Join in that they both preallocate the space necessary to hold the final strings. One reason StringBuilder might be faster is because String.Join has to sum up the lengths of the strings you pass in while StringBuilder just takes an int and uses that for the buffer size.

Anyway, the point is the performance chart is a little misleading when it concludes that String.Join is always faster than StringBuilder. This is probably true only when you can not give StringBuilder the capacity parameter. If you know the length of the final string, then you should use StringBuilder with the capacity parameter otherwise String.Join will have to do unnecessary work.

Adding to this, StringBuilder can append single character for e.g. StringBuilder.Append("a") whereas String.Join needs an array. So there surely lies overhead in building an array out of say 30000 characters before calling a String.Join().

I beleive you need to consider much more than speed for your testing. Especially if you ran this on your desktop, its not really real world testing. Not taking into account CPU and memory considerations really makes your findings invalid. As memory and CPU usuage go up on String manipulation I think you will find the memory and CPU usage saved by using stringbuilder will change your result dramatically.

I know that this is a highly debatable field with many true and false arguments on both sides. This is why I did write in the disclaimer:

- Do not believe me
- Measure for yourself
- The results obtained here are only in the specified scenario valid.

This are actually measured values which where published as is. There is nothing wrong with them except if you interpret the data in the wrong way or if you generalize your results into regions where some basic assumptions are no longer true.
If you did find a wrong measurement (wrong loop count, false divisor, etc.) I would be glad to hear about it. Please be more specific what exactly in the test code is wrong so I can fix it.

I have tested the link both with IE and Firefox. So far I have not found any error. Could it be that you have cached a previous version of this page in your browsers cache? Doing a forced reload of the whole page should fix this problem. Please contact me if the problem persists.