You're doing two different things.
> $ cat test.rb
> a = "Der gro\xdfe BilderSauger"
That's a double-quoted string, and so Ruby is doing some translation of
the contents. A common example is \n meaning "newline"; in this case,
\xNN means the byte with hex code NN. So when you do each_byte, that's
what you get, a single byte.
Change the double-quotes to single-quotes and you'll actually get the
four separate characters.
> But when read from a file:
...
> l.each_byte {|b| puts b}
...
> 92 <- Here
> 120 <- we
> 100 <- are
> 102 <- as 4 ASCII chars '\xdf'
That proves that the file actually contains the four characters
'\', 'x', 'd', 'f'. If you want further proof, try
hexdump -C test.in
to take Ruby out of the loop completely.
So there's neither UTF-8 nor ISO-8859-1 in that file, just plain ASCII
characters.
If you want to turn this into something else, you would have to process
it. For example:
l.gsub!(/\\x([0-9a-f]{2})/i) { $1.hex.chr }
# or in ruby 1.9, if you want to tag the encoding:
l.gsub!(/\\x([0-9a-f]{2})/i) { $1.hex.chr("ISO-8859-1") }
--
Posted via http://www.ruby-forum.com/.