It seems to me that many bigger C++ libraries end up creating their own string type. In the client code you either have to use the one from the library (QString, CString, fbstring etc., I'm sure anyone can name a few) or keep converting between the standard type and the one the library uses (which most of the time involves at least one copy).

So, is there a particular misfeature or something wrong about std::string (just like auto_ptr semantics were bad)? Has it changed in C++11?

@Giorgio: That's probably because Java's hard-coded syntactic support for java.lang.String (lack of operator overloading, etc.) would make it a pain to use anything else.
–
Mechanical snailJun 6 '12 at 4:13

7 Answers
7

Most of those bigger C++ libraries were started before std::string was standardized. Others include additional features that were standardized late, or still not standardized, such as support for UTF-8 and conversion between encodings.

If those libraries were implemented today, they would probably choose to write functions and iterators that operate on std::string instances.

@KonradRudolph, it is not the locale system which is broken there (the definition of wchar_t is "wide enough for any supported character set"); systems having committed to a 16 bits wchar_t did at the same time commit to not supporting Unicode. Well, the culprit is Unicode which first guaranteed that it would never use codepoints needing more than 16 bits, then systems committing to a 16 bits wchar_t, then unicode switching to need more than 16 bits.
–
AProgrammerJun 6 '12 at 15:14

Actually... there are several issues with std::string, and yes it gets a bit better in C++11, but let's not get ahead of ourselves.

QString and CString are part of old libraries, therefore they existed prior to C++ being standardized (much like the SGI STL). They thus had to create a class.

fbstring address very specific performance concerns. The Standard prescribes an interface and algorithmic complexity guarantees minima, however it is a Quality of Implementation details whether this end up being fast or not. fbstring has specific optimizations (storage-related, or a faster find for example).

Other concerns that were not evoked here (en vrac):

in C++03 it is not mandatory that the storage be contiguous, making interoperability with C potentially difficult. C++11 fixes this.

std::string is encoding unaware, and has no special code for UTF-8, it's easy to store a UTF-8 string in it and corrupt it inadvertendly

std::string interface is bloated, many methods could have been implemented as free-functions and many are duplicated to conform both to an index-based interface and an iterator-based interface.

Re concern #1 -- C++03 21.3.6/1 guarantees that c_str() returns a pointer to contiguous storage, which provides for some C-interoperability. However you cannot modify the pointed-to data. Typical workarounds include using a vector<char>.
–
John DiblingJun 5 '12 at 14:22

@JohnDibling: Yes, and there is another limitation: it could incur a copy in newly allocated storage (the Standard does not say it shall not). Of course C++11 does not prevent copying either, but since you can simply do &s[0] it does not matter any longer :)
–
Matthieu M.Jun 5 '12 at 14:38

@MatthieuM.: The pointer obtained via &s[0] may not point to a NUL-terminated string (unless c_str() has been called since the last modification).
–
Ben VoigtJun 5 '12 at 16:01

1

@Matthieu: Another buffer is not allowed. "c_str() Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()]".
–
Ben VoigtJun 5 '12 at 17:36

3

What's also worth noting is that nobody in their right mind uses MFC anymore, so it's hard to argue that CString is a string class in modern C++.
–
DeadMGJun 5 '12 at 21:05

For the first 15years you don't provide a string class at all - forcing every compiler on every platform and every user to create their own.

Then you make something that's confused about whether it's supposed to be a full string manipulation API or just an STL char container, with some algorithms that duplicate the ones on a std::Vector or are different.

Where an obvious string operation like replace() or mid() involves such a mess of iterators that you need to introduce a new 'auto' keyword to keep the statement fitting on a single page and leads most people to give up on the whole language.

And then you have unicode 'support' and std::wstring that is just arghh.....

@DeadMG - yes and it was standardised in 1998, 15years after it was invented and 6years after even MSFT were using it. Yes iterators are a useful way of making an array and list look the same, do you think they are an obvious way to do string manipulation?
–
Martin BeckettJun 5 '12 at 20:27

2

C with Classes was invented in 1983. Not C++. The only Standard libraries are those determined by Standard- which, strangely enough, can only happen once you have a Standard, so the earliest possible date for any Standard library is 1998. And iterators could be considered exactly equal to indexes, but strongly typed. I'm all for the fact that iterators suck compared to ranges, but that's not really specific to std::string. The lack of a String class in 1983 does not justify having more of them now.
–
DeadMGJun 5 '12 at 21:04

@DeadMG People were using something called "C++" for many years prior to 1998. I wrote my first program using something called "C++" in 1985. If you want to say that this isn't "real" C++, that's fine, but prior to this, we were writing code and had to get a string class from somewhere. Once we had these legacy codebases, we couldn't exactly throw them out or rewrite from scratch when we got a standard. Now what should have happened is that there should have been a string class that came with cfront.
–
Steven BurnapJun 6 '12 at 1:48

7

@DeadMG - If nobody used a language until it had ISO cert then no language would ever be used since it would never get to ISO. There is no ISO standard for x86 assembler but I'm happy to use the platform
–
Martin BeckettJun 6 '12 at 20:41

Apart from the reasons posted here there is also another one - binary compability. Libraries' writers have no control over which std::string implementation you are using and whether it has the same memory layout as theirs.

std::string is a template, so its implementation is taken from your local STL headers. Now imagine that you are locally using some performance-optimised STL version, fully compatible with the standard. For example, you may have chosen to intrudce static buffer in each std::string to reduce the number of dynamic allocations and cache misses. As a result, memory layout and/or size of your implementation is different than library one's.

If only the layout is different, some std::string member function calls on instances passed from library to the client or the other way around may fail, dependending on which members were shifted.

If the size is different as well, all library types having std::string member will appear to have different sizeof when checked in the library and in the client code. Data members following std::string member will have offsets shifted as well, and any direct access/inline accessor called from the client will return rubbish, despite "looking OK" when debugging the library itself.

Bottomline - if library and the client code are compiled agains different std::string versions, they will link just fine, but it may result in some nasty, hard to understand bugs. If you change your std::string implementation all libraries exposing members from STL have to be recompiled to match the client's std::string layout. And because programmers want their libraries to be robust you'll rarely see std::string exposed anywhere.

To be fair, this applies to all STL types. IIRC they don't have standarised memory layout.

You must be a *nix programmer. C++ binary compatibility is not equal on all platforms, and specifically on Windows NO classes containing data members are portable between compilers.
–
Ben VoigtJun 6 '12 at 12:53

Legacy. Many string libraries and classes were written PRIOR to the existence of std::string.

For compatibility with code in C. The library std::string is C++ where as there are other string libraries which work with C and C++.

To avoid dynamic allocations. The library std::string uses dynamic allocation and may not be suitable for embedded systems, interrupt or real-time related code, or for low-level functionality.

Templates. The library std::string is based on templates. Until fairly recently a number of C++ compilers had poorly performing or even buggy template support. Unfortunately, I work in an industry that uses a lot of custom tools and one of our toolchains from a major player in the industry doesn't "officially" 100% support C++ (with buggy stuff being templates et al).

"Fairly recently" meaning "It's been a decade since even Visual Studio had pretty reasonable support for them"?
–
DeadMGJun 6 '12 at 20:48

@DeadMG - Visual Studio is not the only non-compliant compiler in the world. I work in video games and we are often working on custom compilers for unreleased hardware platforms (happens every few years in the console cycles or as new hardware appears). "Fairly recently" means today -- Right now certain compilers don't support templates well. I can't be specific without violating NDA's but I am currently working on a platform with custom toolchains where C++ support -- especially template compliance -- is considered to be "experimental".
–
AdisakJun 12 '12 at 21:32

It's mostly about Unicode. The Standard support for Unicode is abysmal at best, and everyone has their own Unicode needs. For example, ICU supports every Unicode functionality you could ever want, behind the most disgusting automatically-generated-from-Java interface you could possibly imagine, and if you're on Unix being stuck with UTF-16 may well not be your idea of a good time.

In addition, many people need differing levels of Unicode support- not everyone needs the complex text layout APIs and such things. So it's easy to see why numerous string classes exist- the Standard one is pretty suck and everybody has different needs from the new ones, with nobody managing to create a single class that can perform lots of Unicode support cross-platform with a pleasant interface.

In my opinion, this is mostly the fault of the C++ Committee for not correctly providing support for Unicode- in 1998 or 2003, maybe it was understandable, but not in C++11. Hopefully in C++17 they will do better.

It's because every programmer has something to prove and feels the need to create their own awesome, faster string class for their one, awesome function. It's usually a little superfluous and leads to all kinds of extra string conversions in my experience.

Were this true I'd expect to see a similar number of String implementations in languages like Java where a good implementation has been available all along.
–
Bill KJun 5 '12 at 17:40

@BillK the Java String is final, so you have to put new functionality elsewhere.
–
user1249Jun 6 '12 at 1:10

And my point is, even being final, in 20 years I've never seen anyone write a custom string impelementation (Well, I did to attempt to improve string concatenation performance but it turns out java is MUCH smarter at string+string than you'd imagine)
–
Bill KJun 6 '12 at 1:46

1

@Bill: That might have to do with a different culture. C++ attracts those who want to understand the low-level details. Java attracts those who just want to get the job done using someone else's building blocks. (Note that this is not a statement about any specific individual choosing to use either language, but about the languages' respective design goals and culture)
–
Ben VoigtJun 6 '12 at 15:45