How do the common controls convert between ANSI and Unicode?

Everything goes through CP_ACP, pretty much by definition. The ANSI code page is CP_ACP. That's what ACP stands for, after all.

Now, there are some function families that do not use ANSI. The console subsystem, for example, prefers the OEM character set for its 8-bit strings, and file system functions can go either way, based on the setting controlled by the SetFileAPIsToANSI and SetFileAPIsToOEM functions.

In the scenario Chris describes, I suspect that the problem is not the ANSI-to-Unicode conversion but rather that the font selected into the listview didn't support the necessary characters.

Actually, he does say it’s running on a Traditional Chinese system. I think my point is still valid, though — in my experience it’s usually that the program is expecting CP_ACP to be something when in reality it’s actually something else (my wife is Korean, so I see lots of programs expecting a Korean CP_ACP, when my computer is English).

My guess for Chris’s problem is probably that the Chinese app was running on a English (US) computer (or something that *wasn’t* Chinese) so you end up getting gibberish. Basically, CP_ACP is set to English, but the application was assuming that CP_ACP was set to Chinese.

You can use the AppLocale[1] tool from Microsoft to "trick" the application into thinking it’s running with CP_ACP set to Chinese.

At the time I posted the question I had a bad understanding of how a fonts character-set selection would influence GDI’s later selection of codepage.

The situation was, I was embedding a listview control in a standard dialog box. I can’t remember what font was chosen – which is a problem because it is likely meaningful to this question.

I then loaded some strings froma string table and added the strings to some STATIC, EDIT and ListView controls using the relevent ANSI APIs.

All the user32 controls displayed the text correctly. The ListView however displayed a string that was corrupt in places :- when I explored using LocaleExplorer the result was consistent with interpreting the string using the other chinese codepage.

We solved the problem ultimately by fixing the font, but i was intreuged as to why the ListView – on that particular windows / language combo – was second guessing the codepage i’d use to be something other than CP_ACP.

The console can’t support Unicode, not even UTF-8, since this would break some apps. In DOS there is ASCII (byte values 0 – 127) which maps directly to UTF-8. However with UTF-8 anything past there is different from ASCII. UTF-8 only uses it for non-English letters, math symbols, etc, but in DOS text environments you didn’t have GUIs so there were also block and line characters you could use to create nice-looking text interfaces.

Since some DOS programs use these still you can’t drop Unicode support into the console font and the console since output from Windows and DOS apps can be on the console at once, and some DOS apps use the "extended ASCII" character and your Windows apps would use UTF-8… one of them would end up looking ugly and broken.

At least that’s what I understand it to be given my limited understanding of Unicode and NTVDM.

I don’t know when Unicode was first widely used but MS only introduced Unicode support at an OS-level with NT. With 9x/ME to provide Unicode support to an app you needed to bundle unicows.dll (Microsoft Layer for Unicode).