11.1.7 Column Character Set Conversion

To convert a binary or nonbinary string column to use a
particular character set, use ALTER
TABLE. For successful conversion to occur, one of the
following conditions must apply:

If the column has a binary data type
(BINARY,
VARBINARY,
BLOB), all the values that it
contains must be encoded using a single character set (the
character set you're converting the column to). If you use a
binary column to store information in multiple character
sets, MySQL has no way to know which values use which
character set and cannot convert the data properly.

If the column has a nonbinary data type
(CHAR,
VARCHAR,
TEXT), its contents should be
encoded in the column character set, not some other
character set. If the contents are encoded in a different
character set, you can convert the column to use a binary
data type first, and then to a nonbinary column with the
desired character set.

Suppose that a table t has a binary column
named col1 defined as
VARBINARY(50). Assuming that the information
in the column is encoded using a single character set, you can
convert it to a nonbinary column that has that character set.
For example, if col1 contains binary data
representing characters in the greek
character set, you can convert it as follows:

ALTER TABLE t MODIFY col1 VARCHAR(50) CHARACTER SET greek;

If your original column has a type of
BINARY(50), you could convert it to
CHAR(50), but the resulting values will be
padded with 0x00 bytes at the end, which may
be undesirable. To remove these bytes, use the
TRIM() function:

UPDATE t SET col1 = TRIM(TRAILING 0x00 FROM col1);

Suppose that table t has a nonbinary column
named col1 defined as CHAR(50)
CHARACTER SET latin1 but you want to convert it to use
utf8 so that you can store values from many
languages. The following statement accomplishes this:

ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;

Conversion may be lossy if the column contains characters that
are not in both character sets.

A special case occurs if you have old tables from before MySQL
4.1 where a nonbinary column contains values that actually are
encoded in a character set different from the server's default
character set. For example, an application might have stored
sjis values in a column, even though MySQL's
default character set was latin1. It is
possible to convert the column to use the proper character set
but an additional step is required. Suppose that the server's
default character set was latin1 and
col1 is defined as
CHAR(50) but its contents are
sjis values. The first step is to convert the
column to a binary data type, which removes the existing
character set information without performing any character
conversion:

ALTER TABLE t MODIFY col1 BLOB;

The next step is to convert the column to a nonbinary data type
with the proper character set:

ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET sjis;

This procedure requires that the table not have been modified
already with statements such as
INSERT or
UPDATE after an upgrade to MySQL
4.1 or later. In that case, MySQL would store new values in the
column using latin1, and the column will
contain a mix of sjis and
latin1 values and cannot be converted
properly.

If you specified attributes when creating a column initially,
you should also specify them when altering the table with
ALTER TABLE. For example, if you
specified NOT NULL and an explicit
DEFAULT value, you should also provide them
in the ALTER TABLE statement.
Otherwise, the resulting column definition will not include
those attributes.

<?php/* $Id: mysqlupgrade.php,v 1.3 2005/01/31 22:04:02 shimon Exp $ */// upgrade CHARACTER SET for MySQL 4.1.0 +// // Did you export all databases including mysql database before runing this file ?//// known bug of this program it dont know to treat FULLTEXT index////by Shimon Doodkin shimon_d@hotmail.com

$conn = mysql_connect("localhost", "mashovim.co.il", "***");$printonly=true; //change this to false to alter on the fly$charset="hebrew";$collate="hebrew_general_ci";$altertablecharset=true;$alterdatabasecharser=true;

I've a problem with this method, at least going from latin1_swedish_ci to utf8_general_ci, when switching back to varchar after changing the charset I receive errors on unique fields where it thinks Éleanore and Eleanore are the same (note the É ) Not sure if this is a bug (which it looks like) or if I've missed something that isn't covered with this method.

You can change all databases' charset or only selected. You can select databases to skip. You can also just print queries and execute them via PHPMyAdmin for example.

Posted by
Stephen Balukoff
on
July 8, 2009

The manual page states that: "This procedure requires that the table not have been modified already with statements such as INSERT or UPDATE after an upgrade to MySQL 4.1 or later. In that case, MySQL would store new values in the column using latin1, and the column will contain a mix of sjis and latin1 values and cannot be converted properly."

This applies to tables which have rows with different character sets. While the above statement is probably true for the sjis and latin1 character sets, it turns out that if you have a mix of latin1 and utf-8 in a single table there's probably a "clean" way to fix this. We had to do this recently for a new customer of ours, and since the process was somewhat of a pain to come up with, I wrote about it here (in the hope that I can save someone else out there some time): http://www.blueboxgrp.com/news/2009/07/mysql_encoding