Synopsis

Parameters

A pointer to a byte array containing a sequence of UTF-8 character bytes to be prepared.

inlen

As input argument, the number of bytes to be prepared in inarray. As output argument, the number of bytes in inarray still not consumed.

outarray

A pointer to a byte array where prepared UTF-8 character bytes can be saved.

outlen

As input argument, the number of available bytes at outarray where prepared character bytes can be saved. As output argument, after the conversion, the number of bytes still available at outarray.

flag

The possible preparation options constructed by a bitwise-inclusive-OR of the following values:

U8_TEXTPREP_IGNORE_NULL

Normally u8_textprep_str() stops the preparation if it encounters null byte even if the current inlen is pointing to a value bigger than zero.

With this option, null byte does not stop the preparation and the preparation continues until inlen specified amount of inarray bytes are all consumed for preparation or an error happened.

U8_TEXTPREP_IGNORE_INVALID

Normally u8_textprep_str() stops the preparation if it encounters illegal or incomplete characters with corresponding errnum values.

When this option is set, u8_textprep_str() does not stop the preparation and instead treats such characters as no need to do any preparation.

U8_TEXTPREP_TOUPPER

Map lowercase characters to uppercase characters if applicable.

U8_TEXTPREP_TOLOWER

Map uppercase characters to lowercase characters if applicable.

U8_TEXTPREP_NFD

Apply Unicode Normalization Form D.

U8_TEXTPREP_NFC

Apply Unicode Normalization Form C.

U8_TEXTPREP_NFKD

Apply Unicode Normalization Form KD.

U8_TEXTPREP_NFKC

Apply Unicode Normalization Form KC.

Only one case folding option is allowed. Only one Unicode Normalization option is allowed.

When a case folding option and a Unicode Normalization option are specified together, UTF-8 text preparation is done by doing case folding first and then Unicode Normalization.

If no option is specified, no processing occurs except the simple copying of bytes from input to output.

unicode_version

The version of Unicode data that should be used during UTF-8 text preparation. The following values are supported:

U8_UNICODE_320

Use Unicode 3.2.0 data during comparison.

U8_UNICODE_500

Use Unicode 5.0.0 data during comparison.

U8_UNICODE_LATEST

Use the latest Unicode version data available which is Unicode 5.0.0 currently.

errnum

The error value when preparation is not completed or fails. The following values are supported:

E2BIG

Text preparation stopped due to lack of space in the output array.

EBADF

Specified option values are conflicting and cannot be supported.

EILSEQ

Text preparation stopped due to an input byte that does not belong to UTF-8.

EINVAL

Text preparation stopped due to an incomplete UTF-8 character at the end of the input array.

ERANGE

The specified Unicode version value is not a supported version.

Description

The u8_textprep_str() function prepares the sequence of UTF-8 characters in the array
specified by inarray into a sequence of corresponding UTF-8 characters prepared in
the array specified by outarray. The inarray argument points to a character byte
array to the first character in the input array and inlen indicates
the number of bytes to the end of the array to be
converted. The outarray argument points to a character byte array to the
first available byte in the output array and outlen indicates the number of
the available bytes to the end of the array. Unless flag is
U8_TEXTPREP_IGNORE_NULL, u8_textprep_str() normally stops when it encounters a null byte from the
input array regardless of the current inlen value.

If flag is U8_TEXTPREP_IGNORE_INVALID and a sequence of input bytes does not
form a valid UTF-8 character, preparation stops after the previous successfully prepared
character. If flag is U8_TEXTPREP_IGNORE_INVALID and the input array ends with an incomplete
UTF-8 character, preparation stops after the previous successfully prepared bytes. If the
output array is not large enough to hold the entire prepared text,
preparation stops just prior to the input bytes that would cause the
output array to overflow. The value pointed to by inlen is decremented to
reflect the number of bytes still not prepared in the input array.
The value pointed to by outlen is decremented to reflect the number
of bytes still available in the output array.

Return Values

The u8_textprep_str() function updates the values pointed to by inlen and outlen
arguments to reflect the extent of the preparation. When U8_TEXTPREP_IGNORE_INVALID is specified,
u8_textprep_str() returns the number of illegal or incomplete characters found during the text
preparation. When U8_TEXTPREP_IGNORE_INVALID is not specified and the text preparation is entirely
successful, the function returns 0. If the entire string in the input
array is prepared, the value pointed to by inlen will be 0. If
the text preparation is stopped due to any conditions mentioned above, the
value pointed to by inlen will be non-zero and errnum is set
to indicate the error. If such and any other error occurs, u8_textprep_str()
returns (size_t)-1 and sets errnum to indicate the error.