10.4.4.2 LDML Syntax Supported in MySQL

This section describes the LDML syntax that MySQL recognizes.
This is a subset of the syntax described in the LDML
specification available at
http://www.unicode.org/reports/tr35/, which
should be consulted for further information. The rules
described here are all supported except that character sorting
occurs only at the primary level. Rules that specify
differences at secondary or higher sort levels are recognized
(and thus can be included in collation definitions) but are
treated as equality at the primary level.

Character Representation

Characters named in LDML rules can be written in
\unnnn format,
where nnnn is the hexadecimal
Unicode code point value. Within hexadecimal values, the
digits A through F are
not case sensitive; \u00E1 and
\u00e1 are equivalent. Basic Latin letters
A-Z and a-z can also be
written literally (this is a MySQL limitation; the LDML
specification permits literal non-Latin1 characters in the
rules). Only characters in the Basic Multilingual Plane can be
specified. This notation does not apply to characters outside
the BMP range of 0000 to
FFFF.

The Index.xml file itself should be
written using ASCII encoding.

Syntax Rules

LDML has reset rules and shift rules to specify character
ordering. Orderings are given as a set of rules that begin
with a reset rule that establishes an anchor point, followed
by shift rules that indicate how characters sort relative to
the anchor point.

A <reset> rule does not specify
any ordering in and of itself. Instead, it
“resets” the ordering for subsequent shift
rules to cause them to be taken in relation to a given
character. Either of the following rules resets subsequent
shift rules to be taken in relation to the letter
'A':

<reset>A</reset>
<reset>\u0041</reset>

The <p>,
<s>, and
<t> shift rules define primary,
secondary, and tertiary differences of a character from
another character:

Use primary differences to distinguish separate
letters.

Use secondary differences to distinguish accent
variations.

Use tertiary differences to distinguish lettercase
variations.

Either of these rules specifies a primary shift rule for
the 'G' character: