Wow, I am definitively holding off upgrading mythtv to 0.22 for as long as possible. I am a longtime user and I have partial corruption (sigh). I also use other databases than mythconverg, so I am also thinking that changing a "misconfigured server" /etc/mysql/my.conf (as explained by the Fixing Corrupt Database Encoding) might not be a good idea.

I'm curious as to how you know that you have partial corruption. Have you tried to upgrade already? If you're mysql server has always been configured for either latin1 or utf8 (but not changed from one to the other) you shouldn't have partial corruption.

Also...reconfiguring your database is NOT not needed when you've upgraded to 0.22. Those instructions are for people who fix their database BEFORE upgrading and wish to continue running 0.21...a practice that (from my experience mentioned earlier here) simply does NOT work anyway. I wouldn't recommend that anyone fix their database and attempt to run 0.21. But again...as long as you upgrade mythtv, there's NO need to change your my.cnf at all, as mythtv handles the connection character set stuff itself.

Edit: this is from the wiki:

Quote:

Once you've upgraded MythTV to current SVN trunk or 0.22 (when released), you may restore the old mysql configuration file, if desired.

After re-reading the wiki, I realized that they do in fact tell you to reconfigure mysql before restoring the database. I may not have done that. That part confuses me a bit, as my understanding was that the SET NAMES in the backup file forced the character set of the connection.

In any case, you could disable the other applications you have that use mysql long enough to do that restore fix. You can switch your my.cnf back after doing the restore as long as you're running 0.22.

Wow...you know just when I think I had some understanding of this whole thing...jeez. I wonder if the wiki changed since I did the fix. This part totally confuses me:

Quote:

Then, once you have successfully created a database backup and modified it to "uncorrupt" your database, you will need to reconfigure your MySQL server such that it does not specify a database server default character set. In so doing, rather than forcing all database clients to use utf8 encoding for all communications (even though a program, such as MythTV, may have been written to use a different encoding), you will have configured your server such that, by default, clients use the character encoding of the database to which they connect (but may still request a specific character encoding). Therefore, changing this should not break other programs using other databases on the server; however, verifying this is up to the user.

...and then they describe steps that expressly FORCE the default character set settings to latin1. Is it me or does that make no sense at all.

I have more databases than mythtv, is it really safe to just change those my.conf parameters from utf8? Do I need to dump them and load em back with changed latin1? Will the apps using these databases understand the change? Do I need to recompile them?

Hopefully encoding is stored locally per database (as it claims no change is necessary for the other databases). But changing default char set makes me nervous. Specially since it would be against the default of utf8 on gentoo.

The instructions on the wiki only require that you change that configuration in order to do the fix restore. Changing that configuration will not affect any other database unless some application is writing to those databases while the configuration is changed.

Whatever your additional databases are, it should be easy to make sure they're not getting accessed while your configuration is changed...just don't run those applications until your done and you've changed your config back (and restarted mysql).

I don't really want to question the mythtv folks that wrote that wiki, but personally, I don't understand why changing that configuration should be required in order to do the restore...though it wouldn't do any harm. Here's why I find it hard to believe it's needed:

Once you've modified your backup, it will be doing a "SET NAMES latin1" command. Once it does that, your my.cnf character set settings should have no affect on the restore what so ever. Observe:

"The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose."

...affecting NOTHING but the CREATE DATABASE statement which means it has no affect what so ever on any of this, as you're manually creating the database and altering the character set. The wiki also says that the my.cnf changes they describe make it so "by default, clients use the character encoding of the database to which they connect" which is simply NOT the case...their changes expressly default the client connections to latin1.

Somehow a lot of what they're saying there just doesn't add up to me. If anyone sees something I'm mistaken on there I'd love to know about it.

I'm curious as to how you know that you have partial corruption. Have you tried to upgrade already? If you're mysql server has always been configured for either latin1 or utf8 (but not changed from one to the other) you shouldn't have partial corruption.
Tom

No no upgrade yet. I know I previously have had problems with swedish chars and latin1 and utf8, had to change db charset mythconverg to latin1 (per upgrade advise). I misunderstood, I thought I could run the mythtv_0.22_corruption_test.pl test on the 0.21 database. So maybe I don't know.

Ok, did the Fixing the database corruption, upgraded to 0.22, let it run the db schema upgrade scripts. The mythtv_0.22_corruption_test.pl script says "No failure detected". However the swedish chars are fubar. Typically it looks like utf-8 chars displayed as latin1 chars. That is one swedish char is represented by two chars. For example: säsong is shown as sÃ¤song. I have had a similar experience upgrading before. Some time I think there was a fix posted involving patching db char set to latin1 (don't remember what exactly). Sigh...

PS. I had =.../...-0.21* in keywords for mythtv, contributing factor? DS.

Ok, did the Fixing the database corruption, upgraded to 0.22, let it run the db schema upgrade scripts. The mythtv_0.22_corruption_test.pl script says "No failure detected". However the swedish chars are fubar.

I think (though I'm not certain) that that might be expected. Had you not run the restore fix, it's almost certain that the upgrade would have failed altogether.

I think that in 0.21, once characters like those get written over that utf8 connection as if they were latin1, they may never truly get fixed. However the restore fix at least gets it to a state where the upgrade won't fail due to things like unique key violations etc.

The wiki isn't clear about this. However I can tell you this: If the fix truly corrected the database as though you'd been running latin1 connections all along, then I would have been fine doing the fix early, changing my configurations on the front/backend to latin1, and continuing to run 0.21 until I was ready to upgrade. That in fact did not work for me, and made a mess out of my database.

For example, after the fix any entries in the people table that had any of those sorts of characters in the database, when encountered again by mythfilldatabase, did not recognize the existing names as being the same, thus creating new entries, and a big raging mess.

For most things, any incorrect looking characters will be temporary. Most or all characters that don't display correctly will work their way out of the system as old recordings get deleted, and as the guide refreshes etc. More importantly, going forward with 0.22, all will be fine.

...
For most things, any incorrect looking characters will be temporary. Most or all characters that don't display correctly will work their way out of the system as old recordings get deleted, and as the guide refreshes etc. More importantly, going forward with 0.22, all will be fine.

Tom

I need to fix this I have too much history and too many titles that I want recorded that have swedish characters in the title.
However looking at the mythconverg-to_uncorrupt.sql file, it seems that somehow I have swedish characters that seems to have gone through the utf-conversion twice. That is one char that should be two characters is now actually four characters.
For example: iso-88591 ö (f6) -> utf-8 Ã¶ (c3 b6) -> (double utf-8) c3 83 c2 b6

Hmm, should be fairly easy and dump to file replace the four character sequences with the proper two character sequences. The four char sequences do not occur naturally in swedish text (well two visible).

Hmm, using mythconverg_backup.pl does a utf-8 translation again. So running it on the current 0.22 db I got 8 (eight) characters that really is one iso-8859-1 char (two utf-8 characters).

Ok, I think it went something like this... I had a late 0.21 (~) with a partially utf-8 converted database... ugh. database with both latin1 and utf8. How I got there I think is from some fix I googled after having problems upgrading to a late 0.21 version... at least that is my theory...

Update: changing double converted utf-8 text to normal utf8 worked fine (for the most part). Lil bit of problem with the table people (seems similar to tlds problem with the table), had a lot of duplicates in name, but I didn't have the energy to deal with it now so just added ' dup' on the end.

I think this is specific to mythtvs running late .21 who did manual fixing for latin1/utf8.