This is in response to action 133-A044 “Provide a proposal for the January 2013 UTC meeting for a principle for how to allow characters to have explicit script values and multiple script extension values; and suggested changes to the text and properties to accord with that.” (referenced doc)

At the bottom of this document is a comparison of the current (U6.2) Script property and Script Extensions property values, where they are not identical, followed by a list of the affected characters.

Textual Changes to UAX #24

Here are suggested changes to the text, following Ken’s suggestion of making this an exception rather than a “principle”.

2.1 Special vs. Explicit Script Property Values

...

OLD

NEW

If a character is only regularly used with a single script, then it is given that specific Script property value (as opposed to Common or Inherited). This facilitates the use of the script property for common tasks such as regular expressions, but it also means that some characters that are definite members of a given script, based on their forms and history, nevertheless are assigned one of the generic values. As more data on the usage of individual characters is collected, the Script property value assigned to a character may change. Rarely would a character change from one specific script to another. However, if it becomes established that a character is regularly used with more than one script, it will be assigned the Common or Inherited Script property value. Similarly, if it becomes established that a character is regularly used with only a single, specific script, it will be assigned a specific Script property value. The occasional use of character from one script in the context of another script, as for instance the citation of a Greek letter used as a mathematical constant in the midst of Latin text, or the use of a Latin letter in the midst of Han text, is not considered sufficient evidence of "regular use" requiring a designation of Common Script property value. It is also possible for a character, once given a Common or Inherited Script property value, upon further research, to be changed to a specific script, instead.

If a character is only regularly used with a single script, then it is given that specific Script property value (as opposed to Common or Inherited). In few instances, characters known to be used with more than one script, but which are overwhelmingly associated with and used with a single script, also take the Script property value of that script. The assignment of a single script facilitates the use of the script property for common tasks such as regular expressions. but it also means that some characters that are definite members of a given script, based on their forms and history, nevertheless are assigned one of the generic values.

As more data on the usage of individual characters is collected, the Script property value assigned to a character may change. Rarely would a character change from one specific script to another. However, if it becomes established that a character is regularly used with more than one script, it may be assigned the Common or Inherited Script property value. Similarly, if it becomes established that a character is regularly used with only a single, specific script, it may be assigned a specific Script property value.

The occasional use of character from one script in the context of another script, as for instance the citation of a Greek letter used as a mathematical constant in the midst of Latin text, or the use of a Latin letter in the midst of Han text, is not considered sufficient evidence of "regular use" requiring a designation of Common Script property value. It is also possible for a character, once given a Common or Inherited Script property value, upon further research, to be changed to a specific script, instead.

2.9 Script_Extensions Property

(add just before “The Script_Extensions property values are given in the file ScriptExtensions.txt in the Unicode Character Database [UCD].”)

However, there are some invariants that can be depended on:

The Script Extensions property value for a character will never contain Common or Inherited, unless the that value is the only item, and it is identical with the Script property value for that character.

For example, ScriptExtensions={Common, Arab} will not occur.

If the Script Property value is explicit, then the Script Extensions property value will include it.

For example, Script=Arab & ScriptExtensions={Latn, Deva} will not occur.

A character could have any of the following combinations of properties:

Script=Arab; ScriptExtensions={Arab}

Meaning: the character is only regularly used with Arabic

Script=Arab; ScriptExtensions={Arab, Thaa}

Meaning: the character is regularly used with Arabic, but is occasionally also used with Thaana. The script property value is just Arab, because the overwhelming use is with Arab script characters.

Script=Common; ScriptExtensions={Arab, Deva}

Meaning: the character is regularly used with Arabic and with Devanagari

Script=Common; ScriptExtensions={Common}

Meaning: the character is regularly used with many scripts; it is not primarily used with some single script or subset of scripts.

Minor editorial fix

I found the following while looking at the text. Although we define “explicit” in the following, we don’t always use it consistently. We should search for “specific” and change if necessary.

“All other Script property values are referred to as explicit script values, because they each refer to one specific script.”

Property Changes

In accordance with the first change above, we’d make the following property changes: