Hi JK,
On Feb 5, 2009, at 4:18 PM, Jonathan Kew wrote:
> On 5 Feb 2009, at 14:02, Benjamin Blanco wrote:
>
>> On Thu, Feb 5, 2009 at 1:06 AM, Robert J Burns <rob@robburns.com>
>> wrote:
>> Hi Benjamin,
>>
>> On Feb 4, 2009, at 9:17 PM, Benjamin wrote:
>>> Also, I can see a difference between the characters; The two
>>> brackets at the top and the one on the bottom left are duller,
>>> while the other three are sharper. This difference is apparent in
>>> both the browser and the text editor(Not sure if it matters,
>>> though).
>>
>> I would say that is a bug in your font. Fonts, by using separate
>> glyphs for canonically equivalent characters, contribute to the
>> confusion authors face when creating content. The glyph
>> distinctions lead authors to treat the characters semantically
>> distinct (which shouldn't happen). Fonts play an important role in
>> this (on par with input systems) since the fonts control the glyphs
>> used. For example if a font uses the same glyphs for "½" as the
>> font maker uses for the compatibility equivalent sequence "1⁄2",
>> this helps with Unicode authoring. It is remarkable how few font
>> makers take minimal amount of time necessary to do this.
>
> Fully comprehending and addressing issues of Unicode-to-glyph
> mapping, canonical-equivalent sequences and alternatives, etc,
> requires far from a "minimal amount of time" for font makers.
I'm sorry. I didn't mean to imply that it was a small amount of work
to understand all of this. Clearly it is not. What I meant by that was
that once someone has become a font maker (and therefore necessarily
achieved a certain level of understanding about Unicode and Unicode
imaging), then it is a minimal amount of work to check that
canonically equivalent (and in some cases compatibility equivalent)
share the same rendering (or at least the same rendering up to a
relevant transformation for the compatibility equivalent characters).
> Also, most fonts are targeted at a particular market (such as
> Western Europe), and make no claim to support languages or writing
> systems outside this area. Even in the non-Latin world, fonts are
> developed for limited markets; for example, an Arabic-script font
> might support Arabic, Persian, and Urdu, but not necessarily the
> Arabic-script orthographies of West African languages. However, as
> browser developers we are (or should be) aiming to serve a worldwide
> market, and this does come with additional costs.
Agreed. However even though fonts necessarily target subsets of the
Unicode repertoire, they should always map the glyphs to canonical
equivalents: simply because it is such a trivial thing to do once
everything else about the font has been completed. It doesn't require
any additional glyphs, but simply a few bytes added to a glyph mapping
table.
>> This is a similar problem to font/glyph issues outlined earlier by
>> Andrew Cunningham with various African and Eastern languages.
>>
>> I've tried several different fonts, and they all render the glyphs
>> differently, despite canonical equivalence.
>
> This is somewhat tangential to the real issue, but FWIW.... I
> suspect that in most (or perhaps all) cases, what's really happening
> is that the font you're using does not support the characters U+3008
> and U+3009, and your software is performing a font fallback and
> rendering these from its default CJK font instead. So it's not that
> font developers are providing different glyphs for canonically-
> equivalent characters, but rather, they are not necessarily
> supporting the equivalent characters at all.
I hadn't thought of that, but you're probably right. However this is
either 1) a variation on the same bug I described earlier or 2) a font
that is old and not yet updated to support U+3008 and U+3009. Again,
an updated font, if it supports a particular character, should support
all of canonically equivalent characters for that character since it
does not require producing another glyph, but simply adding a mapping
for an already designed glyph to another character (or character
sequence).
But I think you're right that a likely explanation is that what
Benjamin witnessed was caused by an older font rendering the NFC
characters and caused a font fallback to a new font that simply had a
different glyph for the two canonically equivalent characters.
Normalization in the text processor would have avoided this issue as
well.
Take care,
Rob