03 March 2011

It all started when I tried upgrading to ruby 1.9.2 and learned more
than I ever wanted to know about character encodings. All of a sudden,
my site was showing text humans were never supposed to read, with
gibberish in place of recognisably foreign accented letters.

I tried using the mysql2 gem, and setting Encoding.default_external = 'UTF-8' in my environment.rb, these steps were necessary but not enough.

After much googling, it became evident that I had to go through each text field in each row in each table, and convert each latin-1 character to utf-8.

You would think that alter table #{table} convert to character
set utf8 would do the trick, but no. You would be wrong. At
least, I was.

I didn't want to do all the work he did, and figured a
rails/activerecord migration
might ease the pain somewhat. Below you'll find what I came up with.
Re-use as you please. You'll need to specify the table/column names that
need converting, and you might want to make sure I've covered all the
characters that matter to you.

Basically, all this does is iterate over the tables and columns you
specify, and then iterates over all the shady latin-1 characters you
need to fix, and asks mysql to replace them with the utf-8
equivalent. Someone with stronger mysql-fu might find a cleverer way to
do this; in the meantime, here goes: