It returns two strings: the first one, $processed, is a part before the last starter, and the second one, $unprocessed is another part after the first part. A starter is a character having a combining class of zero (see UAX #15).

Note that $processed may be empty (when $normalized contains no starter or starts with the last starter), and then $unprocessed should be equal to the entire $normalized.

When you have a $normalized string and an $unnormalized string following it, a simple concatenation is wrong:

'C' or 'NFC' for Normalization Form C (UAX #15)
'D' or 'NFD' for Normalization Form D (UAX #15)
'KC' or 'NFKC' for Normalization Form KC (UAX #15)
'KD' or 'NFKD' for Normalization Form KD (UAX #15)
'FCD' for "Fast C or D" Form (UTN #5)
'FCC' for "Fast C Contiguous" (UTN #5)

Note

In the cases of NFD, NFKD, and FCD, the answer must be either YES or NO. The answer MAYBE may be returned in the cases of NFC, NFKC, and FCC.

A MAYBE string should contain at least one combining character or the like. For example, COMBINING ACUTE ACCENT has the MAYBE_NFC/MAYBE_NFKC property.

Both checkNFC("A\N{COMBINING ACUTE ACCENT}") and checkNFC("B\N{COMBINING ACUTE ACCENT}") will return MAYBE. "A\N{COMBINING ACUTE ACCENT}" is not in NFC (its NFC is "\N{LATIN CAPITAL LETTER A WITH ACUTE}"), while "B\N{COMBINING ACUTE ACCENT}" is in NFC.

If you want to check exactly, compare the string with its NFC/NFKC/FCC.

If two characters here and next (as code points) are composable (including Hangul Jamo/Syllables and Composition Exclusions), it returns the code point of the composite.

If they are not composable, it returns undef.

$combining_class = getCombinClass($code_point)

It returns the combining class (as an integer) of the character.

$may_be_composed_with_prev_char = isComp2nd($code_point)

It returns a boolean whether the character of the specified codepoint may be composed with the previous one in a certain composition (including Hangul Compositions, but excluding Composition Exclusions and Non-Starter Decompositions).

$is_exclusion = isExclusion($code_point)

It returns a boolean whether the code point is a composition exclusion.

$is_singleton = isSingleton($code_point)

It returns a boolean whether the code point is a singleton

$is_non_starter_decomposition = isNonStDecomp($code_point)

It returns a boolean whether the code point has Non-Starter Decomposition.

$is_Full_Composition_Exclusion = isComp_Ex($code_point)

It returns a boolean of the derived property Comp_Ex (Full_Composition_Exclusion). This property is generated from Composition Exclusions + Singletons + Non-Starter Decompositions.

$NFD_is_NO = isNFD_NO($code_point)

It returns a boolean of the derived property NFD_NO (NFD_Quick_Check=No).

$NFC_is_NO = isNFC_NO($code_point)

It returns a boolean of the derived property NFC_NO (NFC_Quick_Check=No).

$NFC_is_MAYBE = isNFC_MAYBE($code_point)

It returns a boolean of the derived property NFC_MAYBE (NFC_Quick_Check=Maybe).

$NFKD_is_NO = isNFKD_NO($code_point)

It returns a boolean of the derived property NFKD_NO (NFKD_Quick_Check=No).

$NFKC_is_NO = isNFKC_NO($code_point)

It returns a boolean of the derived property NFKC_NO (NFKC_Quick_Check=No).

$NFKC_is_MAYBE = isNFKC_MAYBE($code_point)

It returns a boolean of the derived property NFKC_MAYBE (NFKC_Quick_Check=Maybe).

EXPORT

NFC, NFD, NFKC, NFKD: by default.

normalize and other some functions: on request.

CAVEATS

Perl's version vs. Unicode version

Since this module refers to perl core's Unicode database in the directory /lib/unicore (or formerly /lib/unicode), the Unicode version of normalization implemented by this module depends on your perl's version.

In older Unicode versions, a small number of characters (all of which are CJK compatibility ideographs as far as they have been found) may have an erroneous decomposition mapping (see NormalizationCorrections.txt). Anyhow, this module will neither refer to NormalizationCorrections.txt nor provide any specific version of normalization. Therefore this module running on an older perl with an older Unicode database may use the erroneous decomposition mapping blindly conforming to the Unicode database.

Revised definition of canonical composition

In Unicode 4.1.0, the definition D2 of canonical composition (which affects NFC and NFKC) has been changed (see Public Review Issue #29 and recent UAX #15). This module has used the newer definition since the version 0.07 (Oct 31, 2001). This module will not support the normalization according to the older definition, even if the Unicode version implemented by perl is lower than 4.1.0.