digitalmars.D - Re: Questions about Unicode, particularly Japanese

Sorry, if it's again top post in your mail clients. I'll try to figure out
what's going on later today.

1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. It
really depends on a situation. The advantage is not only space but also faster
processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) since those 2
bytes can be read at one memory access as opposed to UTF-8. Also, consider
another thing: it's easier (and cheaper) to convert from ANSI to UTF-16 since a
direct table can be created. Whereas for UTF-8, you'll have to do some shifts
to create a surrogate for non-ASCII letters (even for Latin ones).
What encoding is better depends on your taste, language, applications, etc. I
was simply pointing out that it's quite nice to have universal 'tchar' type. My
argument was never about which encoding is better - it's hard to tell in
general. Besides, many people still use ANSI and not UTF-8.

Sorry, if it's again top post in your mail clients. I'll try to figure
out what's going on later today.

It appears as a top-post in my newsreader too.

1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy
encoding. It really depends on a situation. The advantage is not only
space but also faster processing speed (even for 2 byte letters: Greek,
Cyrillic, etc.) since those 2 bytes can be read at one memory access as
opposed to UTF-8. Also, consider another thing: it's easier (and
cheaper) to convert from ANSI to UTF-16 since a direct table can be
created. Whereas for UTF-8, you'll have to do some shifts to create a
surrogate for non-ASCII letters (even for Latin ones).
What encoding is better depends on your taste, language, applications,
etc. I was simply pointing out that it's quite nice to have universal
'tchar' type. My argument was never about which encoding is better -
it's hard to tell in general. Besides, many people still use ANSI and
not UTF-8.

Wouldn't this suggest that the decision of what character type to use
would be more suited to what language you speak than what OS you are
running?
-Steve

Sorry, if it's again top post in your mail clients. I'll try to figure out
what's going on later today.

1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding.
It really depends on a situation. The advantage is not only space but also
faster processing speed (even for 2 byte letters: Greek, Cyrillic, etc.)
since those 2 bytes can be read at one memory access as opposed to UTF-8.
Also, consider another thing: it's easier (and cheaper) to convert from
ANSI to UTF-16 since a direct table can be created. Whereas for UTF-8,
you'll have to do some shifts to create a surrogate for non-ASCII letters
(even for Latin ones).