MySql Latin1 to UTF8 Conversion

For those of you who follow me due to my social media strategy, this is a little break from the norm. I am a nerd and I used a lot of resources to develop this script, so in an effort to give back to the community I’m gonna nerd out for a post.

The Problem with ISO-8859-1 aka Latin1

I am not an expert in character encoding, so I’ll break it down in human terms. Latin1 cannot hold as many characters as UTF8. This becomes a problem when you have encoding from applications such as Microsoft Word, or different languages. UTF8 on the other hand can store everything latin1 can and a whole lot more. As an example, and to learn more about how to make your websites UTF-8 compatible, see Geir Berset’s Mother of all UTF-8 Check lists.

Why not just convert to UTF8 from Latin1

Well, that’s the goal, but the problem is there are a ton of variables in this situation if you couldn’t tell from that checklist above, if you get even one messed up or multiple settings messed up you might (or might not) have a problem. IT gets more complex as you can store UTF8 encoded characters into a latin1 database. (Don’t ask me how, but we did – often!)

The Problem: Our Setup or Why We’re still on Latin1

We’re a small hosting company, we have over 200 sites on our infrastructure; some code we control, but half or more is legacy code which needs to run but we don’t control. We have over 100 databases on our mysql server. We were also running an older version of Apache and PHP. In an effort to bring us out of the stone ages into a more compliant system we decided to move it to a current version of Apache, PHP and mysql. Fortunately I can do this is stages, but because I have to do it in stages there is no option to shut down the site, convert all the settings, the data, the code and then turn the server back on, check verify and clean. I would have to do that 200 times. That’s just not feasible. I’ll focus on the mysql portion of this conversion for now, because I found the help out there to be lacking. The rest of the conversion information is easily found online.

I made a separate environment for testing and I tried and tried the lessons I learned from the inter-searches and could not get the right mix. Mostly because I had UTF-8 characters stored in the Latin1 (ISO 8859 1) encoding.

Mysqldump, Iconv

Many sources suggest taking a mysqldump (not what you are thinking!) and then use the linux command iconv to convert the encoding of the dump to UTF8.

Lots of problems occurred here. What I didn’t know is that mysqldump is a client, and thereby has a character set that it requests from mysql. Mysql in it’s genius-ness can convert data to whatever format the client requests. So the UTF8 characters in the db were assumed to be latin1, and the encoding was lost when iconv tried to convert it.

Did you miss it? I did to. But another hint is that mysql itself can preform data encoding conversions. No matter what the data is in the table/database you can request it in a different format and mysql will do the heavy lifting of converting it. This means we don’t need iconv or any other conversion tool.

Once I learned this and with trial and error, and a bit of luck I found a 5 step method that works every time on our tables. It is a 5 step method, but it feels like a dance, kinda like the hokey pokey. Do this do that do this again and it works.

Five Step Mysql Latin1 Encoding to UTF8 Encoding Script Explained

1. Mysqldump, Force it to be UTF8

Grab the data from mysql and force it to be encoded as UTF8. One might assume this makes it all UTF8 and you simple import it, sorry charlie! We still have 4 steps to go. The Conversion engine gets tricked in this step I think. Since this is copied from a bash script you will see that I use variables, you’ll see the whole script in a minute, but hopefully you can understand this.

2. Lie to Mysql; Tell Mysql the Data is Latin1

Inside the dump file there are two settings which tell the mysql server upon import that this is utf8 encoded. One says this is UTF8 data coming in and the other says this was pulled from utf8 data. We need to change one of those to say latin1. Reason? The Mysql conversion system saw that there were UTF8 characters in the table and then didn’t convert the data in step 1. This way mysql will convert the characters on import.

sed-i's/utf8/latin1/g'$TEMPFILELOC$TEMPFILE

sed -i 's/utf8/latin1/g' $TEMPFILELOC$TEMPFILE

If you don’t know sed, then you don’t know linux. Sed is a line editor which is doing a search and replace inline. Every instance of utf8 will be replaced with latin1 (the other instance of utf8 in the dump is actually utf-8 so it only replaces one.)

3. Import the Mysql Data

This is what hung me up before, once I do the above steps and then re-import the data I assumed that mysql’s conversion engine would put it into the table correctly. But it never worked. Stay tuned for Step 4, the magic.

Travel Blog Posts: Home Along The Way

[et_pb_section admin_label=”section”][et_pb_row admin_label=”row”][et_pb_column type=”4_4″][et_pb_text admin_label=”Text”] Last year for Christmas we were in La Paz, Baja California Sur, Mexico. Becky asked the kids afterwards what they liked and didn’t like about Christmas that year as it was our first Christmas not in Michigan, our first Christmas not with extended family. So it was our opportunity to […]

When we left San Miguel de Allende after being parked still for 6 months we knew we’d be starting another adventure full of issues, stories and breakdowns. But we had no idea how much all of that would happen on day one! We spent the night in a mall parking lot, however we neglected to […]

[et_pb_section admin_label=”section”][et_pb_row admin_label=”row”][et_pb_column type=”4_4″][et_pb_text admin_label=”Text”] Yesterday was the final leg of a marathon as far a our family and Mexican “normal” is concerned. We got sick a week and a half ago. Zander is always cute, but when he’s sick there’s just something endearing about his ‘I’m cute and sick’. Thys was so sick/tired he […]

[et_pb_section admin_label=”section”][et_pb_row admin_label=”row”][et_pb_column type=”4_4″][et_pb_text admin_label=”Text”] No we are not at this RV part right now, but Becky and I just got off of a Skype call with another family who’d like to RV into Mexico and were interested in Clam Beach RV Park, when we dug into our blog posts we found out that we […]

[et_pb_section admin_label=”section”][et_pb_row admin_label=”row”][et_pb_column type=”4_4″][et_pb_text admin_label=”Text”] Last week we made our visa run to Texas and came back with a loaded vehicle, it’s crazy how many things break down and parts are only available in the US. So we took that opportunity. However, we had no idea how quickly we’d be headed back to the US! […]