2-Cent Tips

2-cent Tip: Unicode conversion

A couple of years ago, I decided to stop wrestling with what I call
"encoding craziness" for various bits of non-English text that I have
scattered around my file system. Russian, for example, has at least four
different encodings that I've run into - and guessing which one a given
text file was written in was like a game of darts played in the dark. At
300 yards. With your hands tied behind you, so you had to use your toes.
Oh, and while you were severely drunk on Stoli vodka. UTF-8 (Unicode)
allowed me to, well, unify all of that into one single encoding that was
readable without scrambling for whichever character set I needed (and
may or may not have installed.) Better yet, Unicode usually displays
just fine in HTML browsers - no special entity encoding is required.

For some reason, though, good converters appear to be something of a
black art - and finding one that works, as opposed to all those that
claim to work, was rather frustrating. Therefore, I decided to write
one in my favorite language, Perl - only to find that the job has
already been done for me, via the 'encoding' pragma. In other words,
conversion from, say, KOI8-R to UTF-8 is no more complex than this:

It is literally that simple. Pretty much every encoding you can imagine
is available (see 'perldoc Encode::Supported' for the naming conventions
and charsets). The conversion does not have to be to UTF-8 - it'll do
any of the listed charsets - but why would you care?

# Print the Kanji for 'Rakuda' (Camel) from multibyte strings:
perl -Mencoding=euc-jp,STDOUT,utf-8 -wle'print "Follow the
\xF1\xD1\xF1\xCC!"'
Follow the 駱駝!
# Or you can do it in Hiragana, but using Unicode values instead:
perl -Mencoding=shift-jis,STDOUT,utf8 -wle'print "Follow the
\x{3089}\x{304F}\x{3060}!"'
Follow the らくだ!

Kat likes to tell people she's one of the youngest people to have learned
to program using punchcards on a mainframe (back in '83); but the truth is
that since then, despite many hours in front of various computer screens,
she's a computer user rather than a computer programmer.

When away from the keyboard, her hands have been found full of knitting
needles, various pens, henna, red-hot welding tools, upholsterer's shears,
and a pneumatic scaler.