Hello everybody,
Sorry to be late to this discussion.
On 2011/01/29 0:59, Phillips, Addison wrote:
>>>
>>> I want "ソース" consists of three, so from what you said, it
>> sounds
The question of whether the above is two or three syllables depends on
the definition. In very detailed discussions, one ends up with three
*morae* (see http://en.wikipedia.org/wiki/Mora_(linguistics)#Japanese)
and two syllables. But such details are lost on both Japanese and
non-Japanese non-experts.
>>> like "grapheme cluster" is the right choice of words to use here.
Grapheme cluster doesn't combine 'ソ' and 'ー', as far as I understand.
"ソー" isn't a "user-perceived character", the description given at the
start of http://unicode.org/reports/tr29/. The fact that line breaks
between 'ソ' and 'ー' are a bad idea is handled by disallowing 'ー' at
the start of a line.
So the question of whether Japanese typography uses characters or
grapheme clusters for line breaking essentially depends on what it does
for non-Japanese (e.g. Indic, Thai,...) text. That also includes Ainu,
where decomposed Kana are needed in some cases. For high precision,
indeed grapheme cluster seems to be the right thing to do, although I
guess a lot of Japanese layout software wouldn't (yet) be able to handle
Indic grapheme clusters correctly.
Regards, Martin.
>> I agree.
>>
>> Addison
>>
>> Addison Phillips
>> Globalization Architect (Lab126)
>> Chair (W3C I18N, IETF IRI WGs)
>>
>> Internationalization is not a feature.
>> It is an architecture.
>>
>>
>>
>
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp