FUNCTIONS

decode_utf8

Returns an decoded representation of $octets in UTF-8 encoding as a character string.

$fallback is an optional CODE reference which provides a error-handling mechanism, allowing customization of error handling. The default error-handling mechanism is to replace any ill-formed UTF-8 sequences or encoded code points which can't be interchanged with REPLACEMENT CHARACTER (U+FFFD).

$string = $fallback->($octets, $is_usv, $position);

$fallback is invoked with three arguments: $octets, $is_usv and $position. $octets is a sequence of one or more octets containing the maximal subpart of the ill-formed subsequence or encoded code point which can't be interchanged. $is_usv is a boolean indicating whether or not $octets represent a encoded Unicode scalar value. $position is a unsigned integer containing the zero based octet position at which the error occurred within the octets provided to decode_utf8(). $fallback must return a character string consisting of zero or more Unicode scalar values. Unicode scalar values consist of code points in the range U+0000..U+D7FF and U+E000..U+10FFFF.

encode_utf8

Returns an encoded representation of $string in UTF-8 encoding as an octet string.

$fallback is an optional CODE reference which provides a error-handling mechanism, allowing customization of error handling. The default error-handling mechanism is to replace any code points which can't be interchanged or represented in UTF-8 encoding form with REPLACEMENT CHARACTER (U+FFFD).

$string = $fallback->($codepoint, $is_usv, $position);

$fallback is invoked with three arguments: $codepoint, $is_usv and $position. $codepoint is a unsigned integer containing the code point which can't be interchanged or represented in UTF-8 encoding form. $is_usv is a boolean indicating whether or not $codepoint is a Unicode scalar value. $position is a unsigned integer containing the zero based character position at which the error occurred within the string provided to encode_utf8(). $fallback must return a character string consisting of zero or more Unicode scalar values.Unicode scalar values consist of code points in the range U+0000..U+D7FF and U+E000..U+10FFFF.

valid_utf8

$boolean = valid_utf8($octets);

Returns a boolean indicating whether or not the given $octets consist of well-formed UTF-8 sequences.

EXPORTS

None by default. All functions can be exported using the :all tag or individually.

(W utf8, nonchar) Noncharacters are code points that are permanently reserved in the Unicode Standard for internal use. They are forbidden for use in open interchange of Unicode text data. Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 10^16) and the values U+FDD0..U+FDEF.

Can't represent surrogate code point U+%X in position %u

(W utf8, surrogate) Surrogate code points are designated only for surrogate code units in the UTF-16 character encoding form. Surrogates consist of code points in the range U+D800 to U+DFFF.