Strings too slow outside WIN32?

In a recent debate I had it was said that strings in the Win64 runtime are too slow to be useful. That is, in my opinion, a gross exaggeration. It is true that the Win32 runtime library (RTL) has benefited a lot from the work of the FastCode project, usually with routines in extremely clever assembler. For all other platforms, often the routines are in plain Object Pascal, so no assembler is being used. Also, far fewer routines have been replaced by clever implementations.

One very obvious example of this is the Pos function, which searches if a certain string (I call that the Needle) can be found in a larger one (the Haystack). The Win32 implementation is in highly optimized assembler, written by Aleksandr Sharahov from the FastCode project, and licensed by CodeGear. The Win64 implementation is in plain Pascal (PUREPASCAL). But the implementation for UnicodeString is not the same, or even similar, to the implementation for AnsiString!

The implementation for UnicodeString is slower than the same routine for Win32. On my system a search in Win64 takes approx. 1.8 × the time it needs in Win32. On Win32, Pos for AnsiString is about as fast (or sometimes even slightly faster than) Pos for UnicodeString. But on Win64, Pos for AnsiString takes 2 × the time Pos for UnicodeString needs!

If you look at the sources in System.pas, you'll see that the Unicode version is slightly better optimized (searching for the first Char in the Needle first, and only checking the rest if a match was found).

For fun, I took the code for the UnicodeString implementation and converted it to work for AnsiString. It was slightly faster than System.Pos for UnicodeString, instead of 2 times as slow. I wonder why, in System.pas, the AnsiString implementation does not simply use the same code as that for UnicodeString, like I did. If I were a suspicious person, I would think it was done on purpose, to deprecate AnsiString by making it less usable.

But even that can be improved upon. I wrote three implementations of my own routine, one for AnsiString, one for UnicodeString and one for TBytes (many people have complained that TBytes lacks something like Pos and that was the reason they maintained the incredibly bad habit of using strings to store binary data — <shudder> — I wanted to take away that silly argument).

Code

Here is the code for my RVPosExA function (for what it's worth: these days, there is no difference between PosEx and Pos anymore: both have the exact same functionality and signature):

As you can see, under Win32, it simply jumps to System.Pos, as that is the fastest anyway. But on all other platforms, it searches the Haystack 4-byte-wise (if the Needle is larger than 4 elements), and if it found something, then it searches the rest using CompareMem.

Timing

Here is a slightly reformatted output of a test program (I put the WIN32 and the WIN64 columns beside each other, to save space):

The routines RVPosExA, RVPosExU and RVPosExB are my implementations for AnsiString, UnicodeString and TBytes respectively. OrgPosA is the original code for Pos for AnsiString, while PosUModForA is the original PUREPASCAL code for Pos for UnicodeString, modified for AnsiString.

As you can see, the PosUModForA routine is almost twice as fast as the rather braindead OrgPosA, and in WIN32, the RVPosEx<A/U/B> implementations are faster than the others.

I didn't check, but it is well possible that one of the plain Pascal versions of the FastCode project is faster. But for me, this implementation is a start and proof, that with a few simple optimizations string routines could be made faster. Perhaps, one day, Embarcadero will adopt more of the plain Pascal code from the FastCode project.