Class UString.

Unicode is the common character encoding for all strings except those limited to US-ASCII, but such strings are sparingly manipulated.

Most of the functionality of UString is concerned with conversion to/from other encodings, such as ISO-8859-15, KOI-U, etc, etc. Other functionality is intentionally kept to a minimum, to lighten the testing burden.

Two functions note particular mention are ascii() and the equality operator. ascii() returns something that's useful for logging, but which can often not be converted back to unicode.

There is a fast equality operator which tests against printable ASCII, returning false for every unprintable or non-ASCII character. Very useful for comparing a UString to e.g. "seen" or ".", but nothing more.

int UString::compare( const UString & other ) const

Returns -1 if this string is lexicographically before other, 0 if they are the same, and 1 if this string is lexicographically after other.

The comparison is case sensitive - just a codepoint comparison. It does not sort the way humans expect.

bool UString::contains( const UString & s ) const

Returns true if this string contains at least one instance of s.

bool UString::contains( const char c ) const

Returns true if this string contains at least one instance of c.

bool UString::contains( const char * s ) const

Returns true if this string contains at least one instance of s.

bool UString::endsWith( const UString & suffix ) const

Returns true if this string ends with suffix, and false if it does not.

bool UString::endsWith( const char * suffix ) const

Returns true if this string ends with suffix, and false if it does not. suffix must be an ASCII or 8859-1 string.

int UString::find( char c, int i ) const

Returns the position of the first occurence of c on or after i in this string, or -1 if there is none.

int UString::find( const UString & s, int i ) const

Returns the position of the first occurence of s on or after i in this string, or -1 if there is none.

bool UString::isAscii() const

Returns true if this string contains only printable tab, cr, lf and ASCII characters, and false if it contains one or more other characters.

bool UString::isDigit( uint c )

Returns true if c is a digit, and false if not.

bool UString::isLetter( uint c )

Returns true if c is a letter, and false if not.

bool UString::isSpace( uint c )

Returns true if c is a unicode space character, and false if not.

UString UString::mid( uint start, uint num ) const

Returns a string containing the data starting at position start of this string, extending for num bytes. num may be left out, in which case the rest of the string is returned.

If start is too large, an empty string is returned.

uint UString::number( bool * ok, uint base ) const

Returns the number encoded by this string, and sets *ok to true if that number is valid, or to false if the number is invalid. By default the number is encoded in base 10, if base is specified that base is used. base must be at least 2 and at most 36.

If the number is invalid (e.g. negative), the return value is undefined.

If ok is a null pointer, it is not modified.

UString & UString::operator+=( const UString & other )

Appends other to this string and returns a reference to this strng.

UString & UString::operator=( const UString & other )

Makes this string into an exact copy of other and returns a reference to this strng.

void UString::operatordelete( void * p )

void UString::reserve( uint num )

Ensures that at least num characters are available for this string. Users of UString should generally not need to call this; it is called by append() etc. as needed.

void UString::reserve2( uint num )

Equivalent to reserve(). reserve( num ) calls this function to do the heavy lifting. This function is not inline, while reserve() is, and calls to this function should be interesting wrt. memory allocation statistics.

UString UString::simplified() const

Returns a copy of this string where each run of whitespace is compressed to a single space character, and where leading and trailing whitespace is removed altogether. Most spaces are mapped to U+0020, but the Ogham space dominates and ZWNBSP recedes.