{{Note box|This article is not relevant to anyone running MythTV 0.22 or higher.}}

{{Cleanup|Too Wordy, too much explanation}}

{{Cleanup|Too Wordy, too much explanation}}

Line 5:

Line 6:

In 0.21-fixes and below, MythTV intentionally stored UTF-8 in latin1 encoded database/tables. Since MythTV didn't set any charset related connection options it '''required''' that the database be set up with latin1 as the default character set (charset). That prevents charset conversions in MySQL since it thoughts that text columns are encoded using the latin1 encoding and that all communications with the database be done using the latin1 encoding. Some distros (notably, Gentoo) and some users have configured their MySQL servers with a different default charset (MySQL changed the default to UTF-8 with MySQL upstream). Using a different charset may not effect the day to day usage with 0.21-fixes and below (as long as MythTV gets the exact same data back everything will look right) but results in a different on disk data representation.

In 0.21-fixes and below, MythTV intentionally stored UTF-8 in latin1 encoded database/tables. Since MythTV didn't set any charset related connection options it '''required''' that the database be set up with latin1 as the default character set (charset). That prevents charset conversions in MySQL since it thoughts that text columns are encoded using the latin1 encoding and that all communications with the database be done using the latin1 encoding. Some distros (notably, Gentoo) and some users have configured their MySQL servers with a different default charset (MySQL changed the default to UTF-8 with MySQL upstream). Using a different charset may not effect the day to day usage with 0.21-fixes and below (as long as MythTV gets the exact same data back everything will look right) but results in a different on disk data representation.

−

This different on disk data representation will result in a failing database schema update after upgrading to post-r16789 SVN trunk (which means that upgrades to 0.22 will fail) and any non-Latin characters in data will be improperly displayed, even in MythTV 0.21-fixes and below. Before attempting to upgrade to 0.22, those users who have been running MythTV against a misconfigured database ''must'' fix the data encoding as described below. Those users running MythTV 0.21-fixes or below with non-Latin characters in their data should probably fix the data encoding as soon as possible.

+

This different on disk data representation will result in a failing database schema update after upgrading to 0.22 will fail and any non-Latin characters in data will be improperly displayed, even in MythTV 0.21-fixes and below. Before attempting to upgrade to 0.22+, those users who have been running MythTV against a misconfigured database ''must'' fix the data encoding as described below. Those users running MythTV 0.21-fixes or below with non-Latin characters in their data should probably fix the data encoding as soon as possible.

If your database server is configured to use UTF-8 with 0.21-fixes and below, it's not your fault or your distro's fault. Until the upgrade past 0.21-fixes, no one had identified the requirement that was imposed upon the server configuration by MythTV's use of the Qt MySQL drivers. So, now, all you need to do is clean up the data.

If your database server is configured to use UTF-8 with 0.21-fixes and below, it's not your fault or your distro's fault. Until the upgrade past 0.21-fixes, no one had identified the requirement that was imposed upon the server configuration by MythTV's use of the Qt MySQL drivers. So, now, all you need to do is clean up the data.

Line 13:

Line 14:

The approach described below for fixing this type of database corruption assumes that all the data in your database is "equally" corrupted. If you have, over time, run MythTV systems using multiple differently-configured systems (i.e. a mix of different distros)--primarily, if you've run the mythconverg database on differently-configured MySQL servers--then your data is likely only partially-corrupt (meaning some values in a given column of a given table are corrupt and others are not), and fixing only the corrupt portions will require a significant amount of manual work. If this is the case, your best bet is likely to do a full [[Database_Backup_and_Restore|backup of your 0.21-fixes database]], then drop the database, ensure your database server is [[Fixing_Corrupt_Database_Encoding#Changing_the_MySQL_server_configuration|properly configured for 0.21-fixes]], create a new database by running the mc.sql script, then create a new schema by running the 0.21-fixes version of mythtv-setup or mythbackend, then do a [[Database_Backup_and_Restore#Partial_restore_of_a_backup|partial restore]]. Finally, create a backup of the partially-restored database and upgrade using the partial 0.21-fixes database. If you take this approach, do not perform the steps listed below for fixing your database.

The approach described below for fixing this type of database corruption assumes that all the data in your database is "equally" corrupted. If you have, over time, run MythTV systems using multiple differently-configured systems (i.e. a mix of different distros)--primarily, if you've run the mythconverg database on differently-configured MySQL servers--then your data is likely only partially-corrupt (meaning some values in a given column of a given table are corrupt and others are not), and fixing only the corrupt portions will require a significant amount of manual work. If this is the case, your best bet is likely to do a full [[Database_Backup_and_Restore|backup of your 0.21-fixes database]], then drop the database, ensure your database server is [[Fixing_Corrupt_Database_Encoding#Changing_the_MySQL_server_configuration|properly configured for 0.21-fixes]], create a new database by running the mc.sql script, then create a new schema by running the 0.21-fixes version of mythtv-setup or mythbackend, then do a [[Database_Backup_and_Restore#Partial_restore_of_a_backup|partial restore]]. Finally, create a backup of the partially-restored database and upgrade using the partial 0.21-fixes database. If you take this approach, do not perform the steps listed below for fixing your database.

−

See http://www.gossamer-threads.com/lists/mythtv/users/406111#406111 for details and a blank 0.21-fixes database backukp you can use.

+

See http://www.gossamer-threads.com/lists/mythtv/users/406111#406111 for details and a blank 0.21-fixes database backuk you can use.

=== When do I need to fix my database configuration? ===

=== When do I need to fix my database configuration? ===

−

You '''must''' fix your database configuration before upgrading to post-r16789 SVN trunk or to 0.22. If you plan to run MythTV 0.21-fixes, you may continue to use your system without fixing your database configuration. To prevent an "equally-corrupt" database from becoming "partially-corrupt" (and, therefore, making clean up /much/ more difficult), it probably makes sense to wait until you upgrade to post-r16789 SVN trunk or to 0.22 before fixing your database.

+

You '''must''' fix your database configuration before upgrading to 0.22+. If you plan to run MythTV 0.21-fixes, you may continue to use your system without fixing your database configuration. To prevent an "equally-corrupt" database from becoming "partially-corrupt" (and, therefore, making clean up /much/ more difficult), it probably makes sense to wait until you upgrade 0.22+ before fixing your database.

Note also that those users whose data contains non-Latin characters will see character/data corruption even in MythTV 0.21-fixes until they fix their database configurations. Those who notice character corruption should fix their database configurations as soon as possible.

Note also that those users whose data contains non-Latin characters will see character/data corruption even in MythTV 0.21-fixes until they fix their database configurations. Those who notice character corruption should fix their database configurations as soon as possible.

Line 103:

Line 104:

The important difference between the output of a misconfigured versus a properly-configured server is that some or all of the character sets in a misconfigured database will be utf8. For a properly-configured MythTV 0.21-fixes or below system, the character sets will be latin1.

The important difference between the output of a misconfigured versus a properly-configured server is that some or all of the character sets in a misconfigured database will be utf8. For a properly-configured MythTV 0.21-fixes or below system, the character sets will be latin1.

−

Note that once upgraded to post-r16789 SVN trunk (or 0.22 or above), the output of the status command will differ--namely, the <code>Db characterset</code> will be <code>utf8</code> in a properly-configured system. The value of the others is not important on post-r16789 SVN trunk/0.22+ versions of MythTV.

+

Note that once upgraded to 0.22 or above, the output of the status command will differ--namely, the <code>Db characterset</code> will be <code>utf8</code> in a properly-configured system. The value of the others is not important on 0.22+ versions of MythTV.

== Fixing the database corruption ==

== Fixing the database corruption ==

Line 164:

Line 165:

=== Restoring the backup ===

=== Restoring the backup ===

−

[[Database Backup and Restore#Full restore of a specific backup file|Restore the backup]], making sure to specify the filename using <code>--filename mythconverg-to_uncorrupt.sql</code> and, if required, the directory (using <code>--directory /path/to/backup</code>). During this process, the data stored inside the database will be converted to the format required by MythTV. You may need to connect to database as root to complete the restore.

+

[[Database Backup and Restore#Full restore of a specific backup file|Restore the backup]], making sure to specify the filename using <code>--filename mythconverg-to_uncorrupt.sql</code> and, if required, the directory (using <code>--directory /path/to/backup</code>). During this process, the data stored inside the database will be converted to the format required by MythTV. Ensure your '''~/.mythtv/backuprc''' is set to use your MySQL administrative account (usually root).

=== Backing up the uncorrupted database ===

=== Backing up the uncorrupted database ===

Now that the database server and database data have been fixed, you should back up your database again, as described by [[Database Backup and Restore]]. Once complete, you should ensure you never restore a backup created before this time (a backup of the corrupt database).

Now that the database server and database data have been fixed, you should back up your database again, as described by [[Database Backup and Restore]]. Once complete, you should ensure you never restore a backup created before this time (a backup of the corrupt database).

+

+

=== Sidenote on "corrupted" UTF8 characters ===

+

+

As described above mythtv actually stores UTF8 values to table that has ISO-Latin1 charset.

+

Not sure how I managed to break my "recorded" table when upgrading to 0.23 but the good news is that I could fix it.

+

If you need to fix a table which shows "wrong" characters on Frontend UI you may want to try these steps. If you have both good & corrupted data in the same table then you need to actually find a way to separate them - these steps will break (or strip) the correct data while fixing the broken one.

+

+

First -as always - make sure that you have up- to date backups on your mythconverg-db and you know how to restore them. This is not covered here.

Check that the data shows ok in the terminal (it should - if it does not then you have some other issue):

+

<pre>

+

# less recorded.sql

+

</pre>

+

Then convert the data to the format which mythtv expects it to be (this is the wonky part here - we actually take the data in as utf8 and then convert it to latin. -c means "strip all corrupted data". ):

In the meantime, you may "revert" to the invalid configuration you were using previously by restoring your /etc/mysql/my.cnf-orig and the backup file you made at the beginning of the process (i.e. the one from before the zcat/sed script above (in the example, the one called, <code>mythconverg-1214-20081217145744.sql.gz</code>).

In the meantime, you may "revert" to the invalid configuration you were using previously by restoring your /etc/mysql/my.cnf-orig and the backup file you made at the beginning of the process (i.e. the one from before the zcat/sed script above (in the example, the one called, <code>mythconverg-1214-20081217145744.sql.gz</code>).

−

== After Upgrading to current SVN trunk/0.22 ==

+

== After Upgrading to current 0.22+ ==

−

Once you've upgraded MythTV to current SVN trunk or 0.22 (when released), you may restore the old mysql configuration file, if desired.

+

Once you've upgraded MythTV to current 0.22+, you may restore the old mysql configuration file, if desired.

== Miscellaneous information ==

== Miscellaneous information ==

Revision as of 08:43, 6 October 2011

Note: This article is not relevant to anyone running MythTV 0.22 or higher.

Cleanup: This article or section may require cleanup. Discuss the issue on the talk page

Background

In 0.21-fixes and below, MythTV intentionally stored UTF-8 in latin1 encoded database/tables. Since MythTV didn't set any charset related connection options it required that the database be set up with latin1 as the default character set (charset). That prevents charset conversions in MySQL since it thoughts that text columns are encoded using the latin1 encoding and that all communications with the database be done using the latin1 encoding. Some distros (notably, Gentoo) and some users have configured their MySQL servers with a different default charset (MySQL changed the default to UTF-8 with MySQL upstream). Using a different charset may not effect the day to day usage with 0.21-fixes and below (as long as MythTV gets the exact same data back everything will look right) but results in a different on disk data representation.

This different on disk data representation will result in a failing database schema update after upgrading to 0.22 will fail and any non-Latin characters in data will be improperly displayed, even in MythTV 0.21-fixes and below. Before attempting to upgrade to 0.22+, those users who have been running MythTV against a misconfigured database must fix the data encoding as described below. Those users running MythTV 0.21-fixes or below with non-Latin characters in their data should probably fix the data encoding as soon as possible.

If your database server is configured to use UTF-8 with 0.21-fixes and below, it's not your fault or your distro's fault. Until the upgrade past 0.21-fixes, no one had identified the requirement that was imposed upon the server configuration by MythTV's use of the Qt MySQL drivers. So, now, all you need to do is clean up the data.

Partial vs "equal" corruption

The approach described below for fixing this type of database corruption assumes that all the data in your database is "equally" corrupted. If you have, over time, run MythTV systems using multiple differently-configured systems (i.e. a mix of different distros)--primarily, if you've run the mythconverg database on differently-configured MySQL servers--then your data is likely only partially-corrupt (meaning some values in a given column of a given table are corrupt and others are not), and fixing only the corrupt portions will require a significant amount of manual work. If this is the case, your best bet is likely to do a full backup of your 0.21-fixes database, then drop the database, ensure your database server is properly configured for 0.21-fixes, create a new database by running the mc.sql script, then create a new schema by running the 0.21-fixes version of mythtv-setup or mythbackend, then do a partial restore. Finally, create a backup of the partially-restored database and upgrade using the partial 0.21-fixes database. If you take this approach, do not perform the steps listed below for fixing your database.

When do I need to fix my database configuration?

You must fix your database configuration before upgrading to 0.22+. If you plan to run MythTV 0.21-fixes, you may continue to use your system without fixing your database configuration. To prevent an "equally-corrupt" database from becoming "partially-corrupt" (and, therefore, making clean up /much/ more difficult), it probably makes sense to wait until you upgrade 0.22+ before fixing your database.

Note also that those users whose data contains non-Latin characters will see character/data corruption even in MythTV 0.21-fixes until they fix their database configurations. Those who notice character corruption should fix their database configurations as soon as possible.

If the MySQL server you're using to run the mythconverg database is also used for other databases, you should definitely wait until you upgrade to current trunk or 0.22 to fix your database, as the other databases may require conversions to run with a latin1 configuration (or just may not work with a latin1 configuration), and trunk/0.22 do not require any specific encoding configuration to work properly. At that point, you'll want to backup all your databases, then drop all your databases, then fix the mythconverg database (including upgrading it to 0.22) as described below, then backup the (good, 0.22) mythconverg database. Once you've got a good 0.22 mythconverg database, drop the mythconverg database, then reconfigure your MySQL server as it was previously configured, and finally, restore all your databases, including the good 0.22 mythconverg database.

Note on MythTV 0.21-fixes and below character encoding

Note that the use of the latin1 encoding does not prevent MythTV from using non-Latin characters. Instead, MythTV actually writes UTF-8 characters into the column and does all required conversions when reading/writing database data. The fact that the database uses latin1 encoding simply means that the MySQL server is unable to understand the data in the database--i.e. any 8-bit or multi-byte characters will be encoded in such a way that the MySQL server or any mysql client will see gibberish rather than the expected character. However, MythTV programs (including mythfrontend's GUI and OSD and MythWeb) will be able to interpret the data properly, meaning the user will see the proper characters through MythTV programs.

For example, in a misconfigured database, the character, "å", would appear as shown when using MySQL programs, such as the mysql command-line client. When properly encoded for MythTV, the character would be shown as, "Ã¥", by MySQL clients. Similarly, the character, "á", would be represented as "Ã¡" in a properly-encoded database. Note that--properly encoded--these characters (which are both represented as 2-byte characters in UTF-8) seem to MySQL programs to be 2 distinct characters. Similarly, 3-byte UTF-8 characters would be represented as 3 characters when viewed using MySQL programs.

Determining if your database is misconfigured

Note that once you start to upgrade your database, the status information shown below will differ. Therefore, once you've begun the database upgrade process, the process described for detecting a misconfigured database server will not work. This is especially true if a failure occurs when upgrading your database schema.

Before the database upgrade occurs (when your database is still a 0.21-fixes or below database), you may determine if your database is misconfigured by executing the following command from the shell (fixing the path to the mysql command-line client and/or the username/database name as required) and type in the password when prompted:

mysql -umythtv -p mythconverg -e 'status;'

Note that the default USE flags for configuring MySQL on Gentoo-based systems include the -latin1 flag. This flag configures the MySQL server incorrectly for MythTV. While Gentoo users may check which USE flags were specified for MySQL, they should also verify their current configuration is incorrect by querying the server status, as shown above, and comparing results with the examples below.

Recognizing the difference

The important difference between the output of a misconfigured versus a properly-configured server is that some or all of the character sets in a misconfigured database will be utf8. For a properly-configured MythTV 0.21-fixes or below system, the character sets will be latin1.

Note that once upgraded to 0.22 or above, the output of the status command will differ--namely, the Db characterset will be utf8 in a properly-configured system. The value of the others is not important on 0.22+ versions of MythTV.

Fixing the database corruption

Note that the approach described here will only work on a database that was corrupted due to a misconfigured MySQL server, as described above. If you have any other type of corruption, applying the procedure outlined below will further corrupt your database--likely breaking it.

Backing up the MythTV database

To fix the broken encoding, you will first need to create a database backup, as described by Database Backup and Restore. Note that this is not an optional step that's recommended solely for safety--the backup will actually be used to "uncorrupt" the data.

Changing the backup file

The backup file created above was encoded using UTF-8 encoding. However, restoring it, unchanged, will restore an exact copy of the corrupt database. Instead, we will modify the backup to "re-encode" the data into the format that MythTV expects. To do so, execute the following command from the shell while in the directory that contains the backup created above (change the backup filename, as appropriate):

If your backup file is not gzip'ed (if it is uncompressed), use cat rather than zcat.

Changing the MySQL server configuration

Then, once you have successfully created a database backup and modified it to "uncorrupt" your database, you will need to reconfigure your MySQL server such that it does not specify a database server default character set. In so doing, rather than forcing all database clients to use utf8 encoding for all communications (even though a program, such as MythTV, may have been written to use a different encoding), you will have configured your server such that, by default, clients use the character encoding of the database to which they connect (but may still request a specific character encoding). Therefore, changing this should not break other programs using other databases on the server; however, verifying this is up to the user. Note, also, that the original configuration--the one that won't work with MythTV--may have been causing data corruption for data used by the other applications the same way it did for MythTV applications, and changing the configuration may suddenly make the corruption visible.

The Gentoo database configuration file, /etc/mysql/my.cnf, should be similar to the one shown below. To fix the server configuration to work with MythTV 0.21-fixes, we must remove the lines starting with, "character-set-server" and "default-character-set". You can do so by executing the following commands as root (or with root permissions, using sudo):

These commands will preserve the original my.cnf (as /etc/mysql/my.cnf-orig ) and create a copy at /etc/mysql/my.cnf with the incorrect lines commented.

Once this is done, restart the mysql server and verify the new server configuration, as above.

Clearing the corrupt database

For the following commands, you will need to use the mysql command-line client. The MySQL user you choose to use must have sufficient permissions to perform the commands. The MySQL root user will have sufficient permissions. You can log in to the database as the root user by executing the following command from the shell (change the database name--here, mythconverg--if appropriate):

mysql -uroot -p mythconverg

Once logged in, drop the corrupt the database (change the database name, if appropriate):

DROP DATABASE IF EXISTS mythconverg;

Then, create a new empty database (change the database name, if appropriate):

CREATE DATABASE IF NOT EXISTS mythconverg;

Then, set the appropriate character set (change the database name, if appropriate):

ALTER DATABASE mythconverg DEFAULT CHARACTER SET latin1;

Finally, exit the mysql client:

quit

Restoring the backup

Restore the backup, making sure to specify the filename using --filename mythconverg-to_uncorrupt.sql and, if required, the directory (using --directory /path/to/backup). During this process, the data stored inside the database will be converted to the format required by MythTV. Ensure your ~/.mythtv/backuprc is set to use your MySQL administrative account (usually root).

Backing up the uncorrupted database

Now that the database server and database data have been fixed, you should back up your database again, as described by Database Backup and Restore. Once complete, you should ensure you never restore a backup created before this time (a backup of the corrupt database).

Sidenote on "corrupted" UTF8 characters

As described above mythtv actually stores UTF8 values to table that has ISO-Latin1 charset.
Not sure how I managed to break my "recorded" table when upgrading to 0.23 but the good news is that I could fix it.
If you need to fix a table which shows "wrong" characters on Frontend UI you may want to try these steps. If you have both good & corrupted data in the same table then you need to actually find a way to separate them - these steps will break (or strip) the correct data while fixing the broken one.

First -as always - make sure that you have up- to date backups on your mythconverg-db and you know how to restore them. This is not covered here.

Check that the data shows ok in the terminal (it should - if it does not then you have some other issue):

# less recorded.sql

Then convert the data to the format which mythtv expects it to be (this is the wonky part here - we actually take the data in as utf8 and then convert it to latin. -c means "strip all corrupted data". ):

What to do if problems occur

If problems occur while attempting to follow the above steps, please post a message to the [mailing list] explaining your configuration (distro/output of MySQL status), what part failed, and the error messages you received (please copy/paste the messages to ensure all relevant info is provided). You may also get a faster response if you mention my nick (sphery) in the message.

In the meantime, you may "revert" to the invalid configuration you were using previously by restoring your /etc/mysql/my.cnf-orig and the backup file you made at the beginning of the process (i.e. the one from before the zcat/sed script above (in the example, the one called, mythconverg-1214-20081217145744.sql.gz).

After Upgrading to current 0.22+

Once you've upgraded MythTV to current 0.22+, you may restore the old mysql configuration file, if desired.