Re: wchar_t encoding?

On Mon, May 24, 2010 at 10:01:53AM -0400, Paul Koning wrote:
> For that reason, gdb does a conversion when it reads string data from
> target memory. It comes from target memory as a byte string, and it
> needs to convert that into something the host can use.
Ok, I nearly understand what you are trying to explain, but I
don't understand why any conversion is needed at all. If you have
a char * variable on the target and the host wants to display the content -
it can only do that reasonably when knowing target's current LC_CTYPE
and applying "a compatible LC_CTYPE" on the host. Why is a wchar_t string
different?
> From what I've learned, "ucs-4" (more precisely,
> "ucs-4be" or "ucs-4le" depending on the host byte order) is the right
> answer most of the time but apparently not all the time.
So you are saying that the target reads the wchar_t * from memory, converts
it to ucs-4* and transfers the result to the host? Maybe for the purpose of
debugging this is close enough to be an acceptable solution; it should even
work (modulo some loss) when the targets internal wchar_t representation
currently is jis/kuten.
But why (besides gdb folks not having it designed this way) couldn't the
target convert the string to soemthing the host and it agreed upon? Or
maybe even apply no conversion at all and have the user manually set some
compatible environment on the host?
Sorry for the stupid questions, I still feel quite confused.
Martin