Introduction

This article introduces a String.ReplaceMany extension method, which
performs multiple string replacements at one time. A single ReplaceMany
method invocation is equivalent to a sequence of calls to the String.Replace[^]. The following
snippet:

Performance is the main difference between
those two methods. In the best (and predictable) scenarios, the ReplaceMany
is up to 10 times faster than the .NET Framework Replace. In the worst-case
scenarios, the ReplaceMany method performs slower for a small number
of replace strings, however it catches up when there are more of them. Generally,
in most scenarios the introduced method's performance is the same or better.

Note: The above code is not an example of the method's application. These snippets
was intended to describe a functionality of the ReplaceMany method.
For actual usage scenarios, see the "Performance Tests" section. More
test results are availible for download in the charts.zip archieve.

The project targets .NET 4.0, but it will compile under .NET 2.0, after removing
the this keyword from the method's signature.

Background

The idea came up while developing a code which took a template text, which contained
markers in form @variable. These markers was then replaced with actual
content, using the String.Replace method. I have realized that performance
had been dropping as the project was developed and I was looking for ways to optimize
the code. One thing which has caught my eye was a sequence of Replace
method invocations.

Using the Code

It is an extenstion method of a String class, which takes two obligatory
parameters and one optional.

oldValues - an array of strings which should be searched for in the
processed string,

newValues - an array of replacement strings,

resultStringLength - a tip for the method saying how big the result
string will probably be. This parameter should be overestimated. The idea is to
prevent a buffer reallocation and therefore improve performace by removing an unnecessary
copying. If the result string length is not easily predictible, then this parameter
should be omitted.

Remarks

If one symbol is a substring of another then the operation becomes ambigous. In
such cases, symbols are replaced in the same order as they appear in the array.
For example:

Another kind of ambigouity occurs, when a replacent is equal to a latter "old value", like here:

"abc".Replace("b", "c").Replace("c", "!"); // returns "b!!"

The .NET Replace method returns "b!!", which is incorrect
in the sense of replacing multiple strings at one time, because "b"
is replaced with "!" instead of "c". The ReplaceMany
method's behaviour is different and the return value is "ac!".

Note: this behaviour was introduced in an article's update. A behaviour of the first
version of method was undefined for ambigous cases.

That would be all about a usage of the method. Now let's dig into an implementation.

Implementation

The implementation is a bit tricky and contains a plenty of optimizations, including
unsafe code blocks and calls to native memory management functions. The first version
used fixed blocks[^]
inside a loop, but it caused an unexpected performance drop. In the current version,
strings are pinned at the beginning and freed at the method's exit point. By "pinning",
I mean telling GC (a garbage collector, the .NET Framework memory management engine)
not to move an object around in a memory while operating on it. It is necessary
when using pointers (unsafe code) and can be accoplished by using fixed
statements or a GCHandle structure[^].
For comparing characters, the memcmp function[^]
is used. Memory copying is performed by the memcpy[^].

Performance tests

I have done a lot of testing to ensure that I haven't implemented a method which
is actually slower that sequential calling String.Replace. The results
are approximate, but they reflect a nature of the method. Here are
some interesing results.

The ReplaceMany method performs best when small strings are replaced
with big strings. In a test environment, I'd been using 10-character long strings
as "old values" and 100-character strings as "new values". The
variable parameter was a number of replace strings, aka a number of String.Replace
invocations. The results were as following:

All results are in miliseconds and multiplied by 1000, for clarity.
The ReplaceMany
method starts to win when there are 3 replacement strings. With 100 strings, there is about a 35x
performance increase compared to a standard .NET code.

The testing code is very striaghtforward. A code which generates problem instances:

This is the worst-case scenario, and the .NET code wins. Extra characters added
to a test string ensure that there are no buffer reallocations. It seems that the
.NET's Replace uses a slow algorithm for that. A difference decreases
as a number of replacement strings increases but still the ReplaceMany
method is slower.

Summary

Concluding, the ReplaceMany is best when you have to replace many different
short strings with long strings. Filling a script template with data is a good example
of a practical application. Moreover, using the ReplaceMany method
seems to be safer in some scenarios, when the .NET method either crashes or behaves
unstable.