Unicode Collation Algorithm (UCA)

The Unicode Collation Algorithm (UCA) is an algorithm for sorting the entire Unicode character set. It provides linguistically
correct comparison, ordering, and case conversion. The UCA was developed as part of the Unicode standard. SQL Anywhere implements
the UCA using the International Components for Unicode (ICU) open source library, developed and maintained by IBM.

Note

The default UCA ordering sorts most characters in most languages into an appropriate order. However, because of the sorting
and comparison variations between languages sharing characters, the UCA cannot provide proper sorting for all languages. For
this purpose, ICU provides a syntax for tailoring the UCA. See Collation tailoring options.

The UCA provides advanced comparison, ordering, and case conversion at a small cost in space and time.

The mapped form of a string is longer than the original string. The algorithm provides sophisticated handling of more complex
characters.

Unlike the SQL Anywhere Collation Algorithm (SACA) the Unicode Collation Algorithm (UCA) is only for use with single-byte
and UTF-8 character sets, and it separates each character into one or more attributes. For letters, these attributes are base
character, accent, and case.

Non-letters typically have only one attribute, the base character.

UCA compares character strings as follows:

Compare the base characters. If one string of base characters differs from the other, then the comparison is complete. Accent
and case are not considered.

If the database is accent sensitive, compare the accents. If the accents differ, then the comparison is complete. Case is
not considered.

If the database is case sensitive, compare the case of each character.

The original string values are equal if and only if the base characters, accents, and case are the same for both strings.

Suppose UCA is used to compare the strings in the first column of the table below. The subsequent columns describe the three
attributes for each string. Notice that the base characters are identical; the words differ only in accents and case.

String

Base characters

Accents

Case

noel

noel

none, none, none, none

lower, lower, lower, lower

noël

noel

none, none, accent, none

lower, lower, lower, lower

Noel

noel

none, none, none, none

upper, lower, lower, lower

Noël

noel

none, none, accent, none

upper, lower, lower, lower

The following table shows the ordering that would occur in the four possible combinations of accent- and case-sensitivity
using UCA: