If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Internationalization (i18n), Unicode - Questions

hi there,
i'm turning to you because i run into a wall concerning simple unicode questions (to you pros they might be simple, i suppose), maybe you can help me on (please!).

i'm not an absolute beginner coder but absolutely new to internationalization (i18n). in particular and to make things worse, i also want to code for different systems (windows9x up to xp).
i have some basic questions which i can't aswer myself also after hours (days) of googling - either i search for the wrong things or the things i look for just aren't stated anywhere in a summed up form. i find zillions of pages about string/wstring-usage, unicode vs vc++ multibyte etc.
but my questions are far more fundamental... i guess that's why they are not answered anywhere, a japanese user takes things for granted ("how does a .txt-file look like") and doesn't write about it or think about differences to other systems.

- could somebody sum up in a few sentences, what "unicode-support" really means? i know all the nt-apis exist doubled (an ...A and a ...W version) so everything doable with apis (including file names with, for example, japanese signs) must be possible with unicode as well.

- windows9x doesn't support unicode (there is an addon unicode layer, but you can't count on it) - how does japanese windows 95, for example, work? does it only rely on fonts (there is an IME for non-japanese users wanting to use japanese signs and stuff)? if it is only fonts, then how do japanese w95 users deal with FILES, can they use their signs or only ascii for filenames and folders?

- if i look at a .txt file created with notepad on a japanese system with a hex-editor, will i see double-byte representations, though it's only a simple .txt? is a .txt file of a windows 95 japanese system only ascii? can somebody provide links to such files (a windows95/98/xp .txt file created on a japanese system - or any other unicode, like russian, i guess?)?

- i want to provide easy localisation to users, they should be able to use an .ini file to change language easily. if i look at other solutions to this, like nirsoft.nets excellent programs, for example a russian .ini file looks like this:http://www.nirsoft.net/utils/mspass_russian.zip
i don't understand this, it seems like unicode and ascii is mixed up in one .ini, how does this work? what does a unicode .ini look like, the keys still look ascii-only to me, the values seem to be able to contain unicode, but not all do so.

- what is the easiest (most robust etc.) way to accomplish cross-version working of my apps in just ONE .exe (not an extra unicode build)? do i have to LoadLibrary the ...W unicode apis on nt-systems in order the prog also runs on 9x (complicated... debugging=argh)? the unicode layer for 9x is no option as it adds too much bloat and i don't trust it.

i would really appreciate help on this, maybe i'd also write the stuff down for future coders beginning in this area who will have the same "dumb" questions as i do now.

Unicode support is indeed a flimsy term. Ideally, it means that the system recognizes unicode characters, knows how to display, process, read etc. them correctly. The problem is that many systems are able to support Unicode but they're still ASCII-oriented. Win95 is an example.

You're aksing a lot about Japanese so I'll answer with respect to this particular language. The first thing to remmber is that there's not one Japanse writing system but three. So even in an ASCII based system you can somehow get along with the simplified Japanse alpahbet (which codes syllables -- there aren't so many of them in any language). The second system uses syllables but also idiograms, and the last one contains many borrowed iodgrams from Chinese and Korean. This one is really problematic because it can't be encoded in 8 bits. Before the days of Unicode, Asian languages used a multibyte encoding system whereby certain characters are encoded as a single byte and others are encoded as two bytes. This system was rather kludgy and very problematic with pointers because many string operations assume that incrementing a pointer advances exactly one character.

With respect to having a single .exe that supports multiple languages: it's doable but I doubt that that's what you really want. It would mean a lot of runtime LoadStringTable calls, and a lot of other overhead. The easiest (speaking relatively) way to handle it is by separating the business logic, the bitmaps and other culture-independent compoenents from culture-dependent ones. Alternatively, there can be multiple .exe versions: one for Asian languages, one for Western European languages etc. It mostly depends on the specific locales you have in mind. For example, supporting American English and Canadian French at once isn't difficult. However, add Hebrew (written from right to left), Vietnmese (with its numerous tonal diactrictics) and Chinese (no need to add much about it...)to the equation and things start to become more difficult. So when you localize an app, you have to know which traget locales you're about to support.

thanks for your quick and sorry for my late reply,
thank you a lot for the information, to me this means to totally get rid of the idea of designing one-exe-cross-platform-international programs, though it's possible and would be nice.
i will start with an ascii-only version which will be enough for the american and european market.

how much potential customers may i lose this way? if you say too many, i might consider taking the challenge to at least code the thing thoroughly (TCHAR etc... uarx.) to be able to throw out an ascii and a unicode version.

and another question: i'm looking for a tutorial on windows api and the STL, i read that one could use STL vectors to accomplish many of the ugly buffer stuff the api requires. i just don't manage to find this information again...
i'm used so quick visual basic + win32 api stuff where all this dealing with buffers is a lot easier and less error prone. i want to use c++ and have at least a bit of comfort when dealing with the api (and less of the old c-style functions, brrrr).

were you planning on tranlsating all the in-program text into all those languages? Its not enough just to have the letters supported, and if you had not planned to translate (user manual, help, menus, buttons, everything) then you havent lost a thing. You can support just about everything besides some forms of japanese and chinese with ascii + some font manipulation. Theres probably a couple of other huge alphabets but I dont know them. Those can be supported too but you should not do it that way now the unicode exists.

There are more issues as well... some countries display numbers as 1111,234 instead of 111.234 and other common symbols are used differently. Not to mention right to left and bottom to top text 'orientation' of some languages (are there modern bottom up languages?)

Its a big deal to get (a lot of text) right for "everyone" but not a big deal to get (a little text) right for "many" as so many countries learn at least a little english, and latin/germanic languages are similar enough that buttons and such can be "read" by others (I have used programs in german for example, knowing no german but still able to figure out most of the words by placement of buttons and english roots).

<...>There are more issues as well... some countries display numbers as 1111,234 instead of 111.234 and other common symbols are used differently. Not to mention right to left and bottom to top text 'orientation' of some languages (are there modern bottom up languages?)

Yes, there are. And in fact, there are even more complicated languages: Korean for instance uses a syllabic system whereby each syllable is written in a triangle or rectangle so you start from one direction, then move down. Take the word 'and' in English. In a Korean newpaper it would be written like this:

a n
d

(in Korean letters, of course).

There's more to localization than just choosing the language and fonts: dates are written differently in the rare cases where the traget countries use the same calendar system: The Persian date system, the Arabic calendar, the Jewish calendar etc. are entirely different system so you can't even assume that May 18th 2005 means anything in certain countries.
Numerals are also not universal. The Arabic system has different symbols, for example so a simple number such as 112.45 must be written with different numerlas. Currency symbols and writing conventions are different. In German they use a space to seprate the thousands:
10 000, whereas in other countries they use a comma: 10,000.

i hadn't thought about different calendars for now, phew... the problem is not that i wouldn't like to write nicely localized progs, it's the lack of access to testing system - and i guess this is the same for many coders.
i think this is a pity and that os-producers should think about such issues and maybe open themselves up to developers, it would just be really great to be able to test.
----------
jonnin: it's not only about labeling the GUI controls properly and customizing the text output - the main problem which isn't touched by this is filenames, i guess. your program could be oh so pretty and neat to read in some foreign language (e.g. chinese), but if you can't work with their filenames, you're lost.
----------
thanks for the STL tutorial, i've already used vectors plenty of times, i am looking for a win32 api specific tutorial to the STL (or more an STL-specific tutorial to win32 api) - vectors can somehow be used directly in api calls instead of char buffers, for example, but i've forgotten how.
----------
concerning decimal and thousands symbols, it's actually even worse: in german, the standard windows format uses "." to seperate thousands which mixes greatly with english represenation as comma and dot are switched as well:

1.234,56 is perfect german style which is the same as
1,234.56 in most other systems.

this is also a funny issue when coding with german keyboard layout, as the keypad decimal "point" is actually a comma. this doesn't matter in daily usage as programs like excel et al use the comma as decimal symbol, because they use the windows standard - but it DOES matter when coding, as the programming languages (and VBA, for example) do use the decimal point, of course. this makes it impossible to use the keypad to type such numbers in programming languages unless you remap the commy keypad key or use another layout.

look at the following url to see the standard german keyboard layout. this is a n online generator for key-remapping, it produces registry files which patch the win 2k/xp registry (just click a button and then another to replace it, the middle button outputs the registry file):http://www.dirk-schwarzmann.de/progr...figurator.html

but another question: can i assume MFC on systems from windows 95 up? i'm still looking for the least error prone and time consuming yet most portable way of writing apps for win9x-xp. i'm asking because i read a lot of times "you need the msvbrun... dll, download here" but never "you need the mfc...dlls" so far.
WTL seems to be a little relieving but it's very hard to maintain a slick look of your app as well (there are nice mfc controls available which do newest outlook and office style).

Don't assume too much about this for two reasons: there will always be clients with different versions of the MFC dll, so you don't want to mitigate the discrepancies by yourself, especially when talking about a 10 year time-span... If you really want to stick to one .exe, don't assume anything and let users download a bloated file. It's annoying a bit but it's guaranteed to work whereas the assumption that everyone has the MFC dll -- won't. You can also ship two versions: one for clinets that have that DLL installed on their machine (you need to specify the precise lowest version needed) and one for clienst who don't have that DLL or don't know whether they have it.

what do you think about this? and is there anything which comes close to this and is cheaper?
or even the same for WTL? this would be a lot better in matters of the bloat factor :/ but i haven't found anything near to this for WTL or "like" WTL (= api wrapper), which is a pity - do you know any other "looks nice and takes a lot of work from your shoulders" api wrappers which could be used to produce small and independent progs?

jonnin: what exactly do you mean, do this problems also occur if you make a bloated exe with the MFC linked into it?

I dont know if linking the libs to the least common version statically works or not. I have seen things that just didnt work properly (missing menu bars, various things that were missing or did not work properly) from 98 - 2k. If you have trouble thats something to try.

I also don't know what products to use. Codeproject is a good site, if they recommend something its a plus. 600 is a lot of money if your at home, for a (software) company or if you develop a product to sell its nothing in the grand scheme of things.

I agree $600 is quite a lot of money, so unless you have a sound customer base, say a few dozen or a hundred of customers that are paying for your product, I wouldn't spend so much money on something that I can do on my own (albeit with an extra effort). Remember that you'll always have newer versions of the software, so you will have plenty of opportunities to improve, enhance and localize your code base. For the time being, just go for the English speaking market and see if there's demand from other markets.

i18n is a dangerous beast... I can give you my two cent here, adding to the excellent posts already in this thread. Bear in mind that the only OS we support are Unix and XP, personally I do not have any experience with 98/ME

My best (and only) reference is this (it is at the top of my Favorites):http://www.microsoft.com/globaldev/default.mspx
it has everything (and more) to know about global development. If you are new to i18n, start browsing and reading it.

First, it is very difficult to change an application of a reasonable size to support Unicode. it is much better to plan in advance. And it is even more difficult to change design later. This well in advance how to implement localization.

We use satellite dll's to store translation. We have a library that opens the dll in the current locale (if present, otherwise defaults to English) and returns strings by ID (a HRESULTS). The main applications create the library, monitors a change of the locale, asks the library to load a new language file, and in case of success passes the library to all the GUI components, that refresh themself. Pretty easy (at least in words.)
In this way we have only one exe, and distribute the language dlls to the customers who paid for it. More info about satellite dlls in the above link.

Remember: instead of a good product in English only, a Japanese will always buy and a mediocre one that is in Japanese (this is valid for many other countries as well, even European).
A lot depend on the kind of customers. For a scientific program used in a research lab, probably the English is not a problem. If the program is used in a emergency phone dispatcher room, for example, better not to risk: it must be localized.

Marco

"There are two ways to write error-free programs. Only the third one works."
Unknown