On Wed, 08 Feb 2006 11:47:53 +0900, Mark Davis
<mark.davis@icu-project.org> wrote:
>
>> For all these languages you have transliteration schemes which describe
>> how to convert a string in the original script to a version which uses
>> only latin letters. I think nearly for one of these languages there is
>> a "standardized", totally accepted scheme. But it seems that for your
>> purpose it should be enough to choose just one scheme.
> This is not really the case; most non-Latin to Latin transliterations
> vary quite widely.
>
> Путин ↔ Putin, Poutine, ...
> Горбачёв ↔ Gorbachev, Gorbacev, Gorbatchev, Gorbačëv, Gorbachov,
> Gorbatsov, Gorbatschow, ...
sorry, [I think nearly for one of these] should have been [I think nearly
for *n*one of these]
Felix
>
> Mark
>
> Felix Sasaki wrote:
>>
>> Hi Paul,
>>
>> Sorry for the late follow-up. Just a remark to your question below.
>>
>> On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote:
>>
>>>
>>>> Conversions such as the one you mention from Kanji to Romaji
>>>> have the advantage that the result is still fairly legible,
>>>> but there are various disadvantages:
>>>> - large dictionary needed
>>>> - not deterministic (there is often more than one way to
>>>> pronounce a Kanji or Kanji combination)
>>>> - language-specific, which means a different solution for
>>>> each language is needed
>>>
>>> To provide context for this question from the databinding WG, our goal
>>> is
>>> to provide guidance to implementors of databinding toolkits: tools
>>> that
>>> take a schema and produce a set of programming language bindings, e.g.,
>>> Java classes, that know how to manipulate instances conforming to the
>>> schema. Most binding tools do something like the following. Given
>>> this
>>> schema document fragment
>>>
>>> <xs:complexType name='MyType'>
>>> <xs:sequence>
>>> <xs:element name='child1' type='xs:string'/>
>>> <xs:element name='child2' type='xs:string'
>>> maxOccurs='unbounded'/>
>>> </xs:sequence>
>>> </xs:complexType>
>>>
>>> they will produce a class such as:
>>>
>>> class MyType
>>> {
>>> String child1 ;
>>> List<String> child2 ;
>>> }
>>>
>>> where the element and type names have become names in the programming
>>> language (Java in this case).
>>>
>>> The range of characters that are legal for XML names is much wider than
>>> that supported by many programming languages. The question is: what
>>> guidance should we give binding tool implementors about what they
>>> should
>>> do in the face of XML names that contain characters that aren't legal
>>> in
>>> that programming language?
>>>
>>> One option is: replace "bad" characters with punctuation, etc.
>>> Another option is : for languages that have something resembling a
>>> kanji
>>> to romanji mapping, automate the mapping (if possible/reasonable). If
>>> such automation is not possible/reasonable, perhaps the tool could
>>> provide
>>> a configuration option to allow the user to "manually" specify the
>>> mapping
>>> for the particular names used in the schema.
>>>
>>> We were wondering if i18n had any other options they could recommend or
>>> any advice in general about this problem.
>>>
>>> One question I had was whether languages other than CJK have something
>>> similar to kanji -> romanji? For instance, do hebrew, greek, thai,
>>> etc.
>>> have this concept?
>>
>> For all these languages you have transliteration schemes which describe
>> how to convert a string in the original script to a version which uses
>> only latin letters. I think nearly for one of these languages there is
>> a "standardized", totally accepted scheme. But it seems that for your
>> purpose it should be enough to choose just one scheme.
>>
>> -- Felix
>>
>>>
>>>> - not reversible (there are many Kanji or Kanji combinations
>>>> that lead to the same Romaji)
>>>
>>> That should not be a problem since the binding tool can store the
>>> original
>>> XML name as metadata for each name in the language binding for use in
>>> serializing instances.
>>>
>>> pvb
>>>
>>
>>
>>
>>
>>
>