char is UTF-8 (and by technicality ASCII)
wchar is UTF-16 LE/BE, and yes it is two bytes
dchar is UTF-32 LE/BE, and is four bytes
Casting between them is more-or-less transparent. Any function with signature:
void foo(wchar[])
Will accept a char[], wchar[], or dchar[] as argument. Problem is, DMD's implicit cast between string types just changes the byte bounderies. If you actually want to translate between encodings, then import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions.
So then calling foo() with a char[] would look like:
-- Chris S
nix wrote:
> Thanks a lot. I have only change from char to wchar
> > If anybody now why this do the same?
> > wchar f = 0xfc; char[] e = \xfc; writef("f = %s\n",f); writef("e = %s\n",e);
> > Is wchar a char with 2 bytes ? How can i cast from wchar to char[]?
>

"Chris Sauls" <ibisbasenji@gmail.com> wrote in message news:cv2re4$oce$1@digitaldaemon.com...> char is UTF-8 (and by technicality ASCII)
> wchar is UTF-16 LE/BE, and yes it is two bytes
> dchar is UTF-32 LE/BE, and is four bytes
>> Casting between them is more-or-less transparent. Any function with
> signature:
> void foo(wchar[])
>> Will accept a char[], wchar[], or dchar[] as argument.
really? it errors for me:
test.d(4): function test.foo (wchar[]x) does not match argument types
(char[])
wchart.d(4): cannot implicitly convert expression y of type char[] to
wchar[]

On Thu, 17 Feb 2005 13:39:03 -0600, Chris Sauls <ibisbasenji@gmail.com> wrote:
> char is UTF-8 (and by technicality ASCII)
> wchar is UTF-16 LE/BE, and yes it is two bytes
> dchar is UTF-32 LE/BE, and is four bytes
>> Casting between them is more-or-less transparent. Any function with signature:
> void foo(wchar[])
>> Will accept a char[], wchar[], or dchar[] as argument. Problem is, DMD's implicit cast between string types just changes the byte bounderies.
Often referred to as 'painting'.. which is odd.
I think of it as being similar to a cast from int to uint or vice-versa, this cast does not modify the data in any way, it simply interprets the data in a different way.
This is different to a cast from int to float or vice-versa, where the data format is actually converted from one to the other.
The program at the end is an example of my observations.
> If you actually want to translate between encodings, then import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions.
"translate between encodings" == transcode.
I think explicit transcoding of char[], etc can be compared to explicit casts from integer types to floating point types, neither is, nor perhaps should be implicit (too many side effects perhaps?) but both need to convert the data in order to be valid.
If this change was made it would mean you couldn't paint a char as a wchar directly, but, you could still paint using byte[] as an intermediary. To me, this actually makes more sense. I also don't see it as a particularly large con, painting is inexpensive and it's more likely you want to convert than paint in the case of char[] and friends.
Further, char[] and friends have a specified encoding, so a char[] that is not in that encoding is invalid. The compiler ensures they're correctly encoded at compile time, and even at runtime in cases. It seems to make sense that it should convert on casts also.
Regan
# void main() {
# float fdata;
# uint udata;
# int data;
#
# byte[] raw;
#
# data = 5;
#
# raw = (cast(byte*)&data)[0..4];
# printf("Value of int: %d\n",data);
# printf("Value of bytes in int: ");
# foreach(byte b; raw)
# printf("%02x ",b);
# printf("\n\n");
#
# udata = cast(uint)data;
# raw = (cast(byte*)&data)[0..4];
# printf("Value of uint: %d\n",udata);
# printf("Value of bytes in uint: ");
# foreach(byte b; raw)
# printf("%02x ",b);
# printf("\n\n");
#
# fdata = cast(float)data;
# raw = (cast(byte*)&fdata)[0..4];
# printf("Value of float: %f\n",fdata);
# printf("Value of bytes in float: ");
# foreach(byte b; raw)
# printf("%02x ",b);
# printf("\n\n");
# }

On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message news:cv2re4$oce$1@digitaldaemon.com...>> char is UTF-8 (and by technicality ASCII)
>> wchar is UTF-16 LE/BE, and yes it is two bytes
>> dchar is UTF-32 LE/BE, and is four bytes
>>>> Casting between them is more-or-less transparent. Any function with
>> signature:
>> void foo(wchar[])
>>>> Will accept a char[], wchar[], or dchar[] as argument.
> > really? it errors for me:
> test.d(4): function test.foo (wchar[]x) does not match argument types
> (char[])
> wchart.d(4): cannot implicitly convert expression y of type char[] to
> wchar[]
If you only have one signature with one of the 'string' forms, char[], wchar[], or dchar[], then you can simply use it for all string literals. However, it you attempt to pass a variable with a different data type, you need to do an explicit conversion.
For example ..
void foo(wchar[] x)
{ . . . }
dchar[] y;
foo(y); // Will fail.
foo(toUTF16(y)); // works.
You also get errors if you have two or more different signatures and supply a string literal.
void foo(char[] x) { . . . }
void foo(wchar[] x) { . . . }
void foo(dchar[] x) { . . . }
foo("abcdef"); // will fail.
foo(cast(dchar[])"abcdef"); // works
It would *SO NICE* if we could decorate string literals with the required storage format. For example ...
d"abcdef" // A dchar[] string
w"abcdef" // A wchar[] string
n"abcdef" // A char[] string (narrow).
I know this syntax above will not actually work as we still need raw string capabilities, but something easier that constantly typing 'cast(dchar[])' must be able to be discovered.
--
Derek
Melbourne, Australia
18/02/2005 9:52:23 AM

On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:
> On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek@psych.ward> wrote:
>> On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
>>>>> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message news:cv2re4$oce$1@digitaldaemon.com...>>>> char is UTF-8 (and by technicality ASCII)
>>>> wchar is UTF-16 LE/BE, and yes it is two bytes
>>>> dchar is UTF-32 LE/BE, and is four bytes
>>>>>>>> Casting between them is more-or-less transparent. Any function with
>>>> signature:
>>>> void foo(wchar[])
>>>>>>>> Will accept a char[], wchar[], or dchar[] as argument.
>>>>>> really? it errors for me:
>>> test.d(4): function test.foo (wchar[]x) does not match argument types
>>> (char[])
>>> wchart.d(4): cannot implicitly convert expression y of type char[] to
>>> wchar[]
>>>> If you only have one signature with one of the 'string' forms, char[],
>> wchar[], or dchar[], then you can simply use it for all string literals.
>> However, it you attempt to pass a variable with a different data type,
>> you
>> need to do an explicit conversion.
>>>> For example ..
>>>> void foo(wchar[] x)
>> { . . . }
>>>> dchar[] y;
>>>> foo(y); // Will fail.
>>>> foo(toUTF16(y)); // works.
> > This also 'works' .. not! It compiles, but the output is garbage.
> > # import std.stdio;
> #
> # void foo(wchar[] x)
> # {
> # writefln(x);
> # }
> #
> # void main()
> # {
> # char[] a = "test";
> # foo(cast(wchar[])a);
> # }
You are correct, and I didn't mention this 'technique' because, as you say, it compiles but does not do what you'd expect.
The confusion is no doubt caused by 'cast' currently working differently depending on the context.
For instance, when using cast on a real to get a long, it does storage format conversion. That is, code is generated by the compiler to convert from a 80-byte IEEE floating point format to a 64-byte signed integer format.
However, when using cast of character arrays, it is just used to pretend that something is really something else. So just by using cast(dchar[]) on a char[] variable is only telling the compiler to treat the bytes in the char[] variable as if there were already in a dchar[] arrangement.
> Can we have explicit casts between types with a specified encoding (the char types for example) cause transcoding, i.e. make it call toUTFxx
> > Please?
Sounds nice, but I suspect that we need to have *both* capabilities available to the coder. Namely a way to tell the compiler to convert from one storage format to another, and a way to tell the compiler that even though the explicit data type is 'FOO' we actually want it to be treated as if it were really stored in RAM as a 'BAR'.
This gives the coder and the compiler some useful flexibility.
--
Derek
Melbourne, Australia
18/02/2005 11:07:09 AM

On Fri, 18 Feb 2005 11:26:57 +1100, Derek Parnell <derek@psych.ward> wrote:
> On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:
>>> On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek@psych.ward> wrote:
>>> On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
>>>>>>> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message
>>>> news:cv2re4$oce$1@digitaldaemon.com...>>>>> char is UTF-8 (and by technicality ASCII)
>>>>> wchar is UTF-16 LE/BE, and yes it is two bytes
>>>>> dchar is UTF-32 LE/BE, and is four bytes
>>>>>>>>>> Casting between them is more-or-less transparent. Any function with
>>>>> signature:
>>>>> void foo(wchar[])
>>>>>>>>>> Will accept a char[], wchar[], or dchar[] as argument.
>>>>>>>> really? it errors for me:
>>>> test.d(4): function test.foo (wchar[]x) does not match argument types
>>>> (char[])
>>>> wchart.d(4): cannot implicitly convert expression y of type char[] to
>>>> wchar[]
>>>>>> If you only have one signature with one of the 'string' forms, char[],
>>> wchar[], or dchar[], then you can simply use it for all string literals.
>>> However, it you attempt to pass a variable with a different data type,
>>> you
>>> need to do an explicit conversion.
>>>>>> For example ..
>>>>>> void foo(wchar[] x)
>>> { . . . }
>>>>>> dchar[] y;
>>>>>> foo(y); // Will fail.
>>>>>> foo(toUTF16(y)); // works.
>>>> This also 'works' .. not! It compiles, but the output is garbage.
>>>> # import std.stdio;
>> #
>> # void foo(wchar[] x)
>> # {
>> # writefln(x);
>> # }
>> #
>> # void main()
>> # {
>> # char[] a = "test";
>> # foo(cast(wchar[])a);
>> # }
>> You are correct, and I didn't mention this 'technique' because, as you say,
> it compiles but does not do what you'd expect.
>> The confusion is no doubt caused by 'cast' currently working differently
> depending on the context.
>> For instance, when using cast on a real to get a long, it does storage
> format conversion. That is, code is generated by the compiler to convert
> from a 80-byte IEEE floating point format to a 64-byte signed integer
> format.
>> However, when using cast of character arrays, it is just used to pretend
> that something is really something else. So just by using cast(dchar[]) on
> a char[] variable is only telling the compiler to treat the bytes in the
> char[] variable as if there were already in a dchar[] arrangement.
Yep, see my other post this thread.
>> Can we have explicit casts between types with a specified encoding (the
>> char types for example) cause transcoding, i.e. make it call toUTFxx
>>>> Please?
>> Sounds nice, but I suspect that we need to have *both* capabilities
> available to the coder. Namely a way to tell the compiler to convert from
> one storage format to another, and a way to tell the compiler that even
> though the explicit data type is 'FOO' we actually want it to be treated as
> if it were really stored in RAM as a 'BAR'.
>> This gives the coder and the compiler some useful flexibility.
Yep, see my other post this thread.
:)
Regan