Unicode... Am I missing something here?

This is a discussion on Unicode... Am I missing something here? within the C++ Programming forums, part of the General Programming Boards category; I must be missing something... A #define or a header... something...
In windows all functions respond to the UNICODE define, ...

Define both or un-define both - preferably at the project settings level.

It's good to understanding what TCHAR's are why they were invented - but I don't recommend using them. They were good for targeting Win9x/MBCS and NT/Unicode simultaneously. Not a requirement for new development these days.

>> I should just use the wstring (etc) type everywhere and do everything in unicode?
Or all narrow. I tend to stick with all wide on Windows.

>> The only exception will be files, but they're always a pain anyway...
Well, the wide, formatted I/O interfaces to streams/files in the standard libraries will convert your wide strings to narrow strings based on the LC_CTYPE of the corresponding locale. Newer MS standard libraries do have some extensions in that area for supporting a few Unicode file encodings.

>> I should just use the wstring (etc) type everywhere and do everything in unicode?
Or all narrow. I tend to stick with all wide on Windows.

Hmmmm.... ascii isn't going to work with anything except English, really. It won't make much difference for the statics in dialogs and such but it could pose a real problem for anyone who has filenames etc. in non-latin text... directory listings (for example) would just be scrambled eggs...

>> The only exception will be files, but they're always a pain anyway...
Well, the wide, formatted I/O interfaces to streams/files in the standard libraries will convert your wide strings to narrow strings based on the LC_CTYPE of the corresponding locale. Newer MS standard libraries do have some extensions in that area for supporting a few Unicode file encodings.

gg

Even that can be a problem... if the system automatically converts (for example) a unicode/cyrillic file to ascii, it's toast... even if it was just sent to me to be reprocessed for some reason. If my local causes a type, even endedness, conversion it's going to scramble the file... not good.

>> ascii isn't going to work with anything except English
"Narrow" in standard C or Windows isn't just ASCII. There are 8bit character sets for a whole bunch of languages - in standard C and Windows. For all the languages that Windows supports, there are a hand-full that are "Unicode-Only" however.

>> it could pose a real problem for anyone who has filenames etc. in non-latin text...
Windows filesystems have supported international characters fairly well. They only suffer slightly from certain round-trip issues with ACP<->Unicode conversions. But since NTFS stores things in UTF16LE, using wide Win32 API's typically avoids these issues. FAT/FAT32 can have "what's my encoding" issues (more below).

>> if the system automatically converts (for example) a unicode/cyrillic file to ascii, it's toast...
Not really. For those Windows locales that are not Unicode-only, the conversion from Unicode to the systems ansi-code-page (ACP) is well defined. And on those systems, notepad.exe expects the typical (8bit) TXT file to be encoding with that ACP and so when it reads and displays the contents all is good.

>> even if it was just sent to me to be reprocessed for some reason
Yes, a common ACP-encoded text file has nothing in it that says "hey! this is my encoding!". So on a system with a different ACP, notepad.exe can map the same bytes to totally different glyphs. The same issue can occur with FAT/FAT32, which stores names using the systems ACP. So changing your own ACP, or swapping thumb-drives with a foreign friend can result strange glyphs for filenames. Most folks have learned not to use non-ASCII characters in pathnames in order to mitigate this issue.

But keep in mind that formats like HTML have solved the "what's my encoding" problem by specifying the encoding.
For text files on Windows, a Unicode encoding with BOM avoids these issues when using characters outside of the ASCII character set.

>> LC_CTYPE is what exactly?
It's part of standard C's support of locales. For example, when you call "setlocale(LC_ALL, "");", you're saying "I want to use the users default locale settings". One of those settings is LC_CTYPE, which among other things, specifies the 8bit character encoding which is expected by standard C char API's - and is the encoding in which wchar_t strings are converted to in (non-binary) formatted stream I/O.

So I should take it that within a given area, expecially on the same machine, this is something of a non-issue if everything is Unicode.

The reason it's a matter of some concern is that I've already bumped into it with my current Freeware offering... One of the first things to happen was that I installed a copy on a friend's computer, and he has a number of folders (his music collection in particular) that were set up in the Ukrane before he immigrated here... So he's actually got a mix of text on his hard disk. I solved the problem by re-writing the project in Unicode and basically not caring what was in any text strings it worked with.

I did however, stay with the idea of being able to compile it both ways; primarily out of habit, I think.

But I'll take your assurances and I do appreciate your help...
As the one page I stumbled through looking up the LC_CTYPE thing said "Say goodbye to Char*" ....

>> he has a number of folders (his music collection in particular) that were set up in the Ukrane before he immigrated here.
If you use ansi Win32API's, then Windows trys to convert the Unicode pathnames to the systems ACP. If the systems ACP does not support a particular glyph, you end up with something like "?". Or even more fun, a "best fit" character is found so the pathname looks correct but doesn't work.