Announcement (2017-05-07): www.ruby-forum.com is now read-only since I
unfortunately do not have the time to support and maintain the forum any
more. Please see rubyonrails.org/community and ruby-lang.org/en/community
for other Rails- und Ruby-related community platforms.

On my system:
'â‚¬'.unpack('U*')
Produces:
=> [8364]
I would have expected this:
=> [342, 202, 254]
In fact, I could have sworn that things used to work this way... Am I
going crazy? The following seems to confirm that the string is indeed
using a UTF-8 representation internally.
'â‚¬'.collect
=> ["\342\202\254"]
I get exactly the same results whether $KCODE is set to 'NONE' or 'u'.
Cheers,
Greg

[Greg Hurrell <greg.hurrell@gmail.com>, 2007-02-24 20.00 CET]
> => [342, 202, 254]>> In fact, I could have sworn that things used to work this way... Am I> going crazy? The following seems to confirm that the string is indeed> using a UTF-8 representation internally.>> 'â‚¬'.collect> => ["\342\202\254"]>> I get exactly the same results whether $KCODE is set to 'NONE' or 'u'.
The UNICODE codepoint for the euro sign is 8364. In your string you have
that number encoded as a sequence of bytes [226, 130, 172]. That
encoding is
known as UTF-8. #unpack decodifies that sequence of bytes and gives you
the
number.
For analogy, think as if you had the string "\272!\000\000" and did an
#unpack("I"). The sequence of bytes [186, 33, 0, 0] also represent the
number 8364, but this time encoded in the internal format my computer
uses.
#unpack retrieves that number. The fact that UTF-8 is used for encoding
UNICODE codepoints is incidental to this.
To unpack the bytes from a string use #unpack("C*").
HTH.

On 24 feb, 23:47, Carlos <a...@quovadis.com.ar> wrote:
> UNICODE codepoints is incidental to this.>> To unpack the bytes from a string use #unpack("C*").
Thanks a million, Carlos. I never would have figured that out for
myself. I misunderstood the documentation for String#unpack:
C | Fixnum | extract a character as an unsigned integer
U | Integer | UTF-8 characters as unsigned integers
unpack('C*') does indeed give me what I want...
Cheers,
Greg

Greg Hurrell wrote:
> Thanks a million, Carlos. I never would have figured that out for> myself. I misunderstood the documentation for String#unpack:> C | Fixnum | extract a character as an unsigned integer> U | Integer | UTF-8 characters as unsigned integers
The problem here is the inconsistent use of character in the
documentation. A character is *not* a byte. The documentation
should be revised to use the two words only in their correct
contexts, with annotations to remind people of this use.
Clifford Heath.