Internationalising strings [was: Re: new string library]

From:

John Darrington

Subject:

Internationalising strings [was: Re: new string library]

Date:

Sun, 11 Jun 2006 15:18:18 +0800

User-agent:

Mutt/1.5.4i

On Sat, Jun 10, 2006 at 02:42:36PM -0700, Ben Pfaff wrote:
> 2. Obvously macros like CC_ALNUM are only correct for the C locale.
> Not a problem so long as everyone's aware of it, but naive
> programmers might make some mistakes ...
I'm aware of the problem and trying to think of a good solution.
I've been thinking a bit about it too. In the case of parsing input
syntax, I think the only solution is, to convert the syntax to
(wchar_t *) using mbstowcs before doing anything with it.
Thus, functions like become:
bool lex_is_id1(char c); from data/identifier.c
become
bool lex_is_id1(wchar_t c);
testing for alphanumeric characters then is a matter of calling
iswalnum from wctype.h
At a pinch, we could convert only strings to wchar_t* (ie things
inside "" or '') but it might be easier and just as effecient to
convert the entire syntax file or line. Some strings will end up
being converted back again (eg: variable names) but I don't think this
is too great a price to pay.
Unfortunately, i18n seems to be difficult no matter what you do.
Yes, it's hard. Largely because so many existing libraries don't
follow the rules. I was talking to a guy from Germany recently who had
a problem with a special purpose compiler. It turned out that under
the de_DE locale, one layer of the compiler was producing decimal
commas when the next layer was expecting decimal points, and crashing
when it got a comma it didn't expect.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.