I re-read East_Asian_Width part of UAX44 to understand suggested wordings better.
I'm fine with wordings John suggested (replaced "backgrounds" with "details" given Asmus's suggestion):
The East_Asian_Width property of the Unicode database [UAX44] can
be used to ... (see [UAX11] for more details on this property).
But if you read UAX44, it merely says:
See Unicode Standard Annex #11, "East Asian Width" [UAX11] and
DerivedEastAsianWidth.txt for more details.
And
Property values are described in Unicode Standard Annex #11,
"East Asian Width" [UAX11].
So, I suppose implementers have to read UAX11 anyway, and the link to the data file is in 6.3 Data File section in UAX11[1]. I couldn't find the link in UAX44.
Either works for me, but this makes me feel that referring to UAX44 for East_Asian_width property adds one unnecessary route to the data file.
I hope we're okay to use UAX11 directly, as I think it makes everyone's life a little easier.
[1] http://unicode.org/reports/tr11/#DataFile
Regards,
Koji
-----Original Message-----
From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Asmus Freytag
Sent: Thursday, May 05, 2011 4:13 AM
To: John Daggett
Cc: fantasai; www-style@w3.org; WWW International; Addison Phillips
Subject: Re: [css3-writing-modes] referring to Unicode
On 5/4/2011 12:18 AM, John Daggett wrote:
> fantasai wrote:
>
>> The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives
>> the explanation of what it means, how to use it, etc. So I think
>> that referring to UAX11 is the correct thing to do here. I'll let
>> Addison correct me if I'm wrong.
>>
>> I really don't think it's at all ambiguous what the spec means by
>> "do this with characters classified as fullwidth (F), see UAX11".
>> You think it's ambiguous?
> It's not ambiguous, it just buries the underlying reference. Unless immediately
> obvious, we should be defining CSS properties with respect to specific
> properties in the Unicode database and consistently referring to the location
> of that database, including other references that explain the handling of
> those properties in more detail.
>
> For example:
>
> The East_Asian_Width property of the Unicode database [UAX44] can
> be used to ... (see [UAX11] for more background on this property).
>
> Regards,
>
>
UAX#11 EAW is a good example of why it's important not to bypass the
documentation for Unicode Property data. The UAX makes clear that there
are two levels of character classification, one of which takes into
account context other than character properties.
For many applications, the main issue is whether some characters behave
like "wide" characters (i.e. similar to ideographic characters) or not.
UAX#11 gives a prescription how to make this determination based on a
number of basic character properties. However, one important class of
characters in the data are "A" (for ambiguous).
According to UAX#11, the intent for these characters is to use context
to determine whether they need to be handled like ideographs or like
"regular" characters - without applying this resolution step, characters
of class "A" cannot be handled correctly.
On a system that maps these characters to legacy character sets and uses
legacy fonts, they would be displayed as "wide" characters, while, on
systems that no longer follow these legacy practices some or all of
these characters would be displayed as ordinary (aka narrow characters).
Therefore, to handle these characters, one must know whether the
environment treats (all or some of) them as a legacy system would.
This contingent quality of the classification is something that's not
apparent from the raw values in the database, and rises above mere
"background" information.
A./
PS: the same ambiguity in character handling carries through to UAX#14,
based on the same issue: legacy and non-legacy systems, fonts, etc.
differ fundamentally in how they represent certain characters, hence it
is necessary to supply context.