utf8mb4_general_ci or utf8mb4_unicode_ci for collation

Mon 12th Nov '18, 12:09am

I saw in Wayne's October article on mysql best practices that he recommends a database character set of utf8mb4 and a collation of utf8mb4_general_ci. From what I read online, most people recommend utf8mb4_unicode_ci; apparently this is a newer version. Is there a reason you recommend general, or can we use either?

The collation determines how characters are sorted, nothing more. This would be used in alphabetical sorts and unstructured data lookups in MySQL. They do not determine what is stored as that is the role of the character set. Either will work for general purposes. If your site has a lot of multi-byte characters (Arabic, Cyrillic, Hebrew, or Asian languages) then utf8mb_unicode_ci will probably suit you better. It is mostly Latin language characters, you probably won't see a difference.

The character set of UTFMB4 is the most important part though. This provides support for the majority of characters in languages around the world as well as some speciality characters like the UTF-8 Emoji Standard.

Comment

Wayne - Thanks for the info.
Regarding utf8mb8, my database is currently in latin1 which is creating random junk characters in various posts (this was discovered in my other post about ipad issues). I have no idea why the database was setup in latin1, but I know I need to change it. I plan to modify the database from latin1 to utf8mb4 this evening. Other than the standard myslq commands to modify the database char set and collation (for all tables), is there anything else that I should be mindful of when moving to utf8mb4? My biggest concern is data loss or truncation. I'm assuming I can alter the database, change the language settings in VB to UTF-8, clear the cache and everything will be good, or are there any other specific steps or cautions that I need to be aware of??

Admin for hobbysquawk.com
VB 5.5.2
PHP 7.3.5
Maria DB 10.2.24

Comment

latin1 was the default format for MySQL until 5.7.something. The default for new databases is UTF8MB4 in MySQL 8.0 (there is no MySQL 6 or 7). If vBulletin's installer doesn't create the database, it will use the MySQL Server's default. In current versions of vBulletin's installer will create the database with the UTF8MB4 character set and the collation will be utf8mb4_general_ci (hence my recommendation). The installer only affects new installations though. Upgrades do not change the character set and collation.

Attached is a set of experimental tools to convert your database to UTF8MB4 with a utf8mb4_general_ci collation. The documentation is in the readme.md text file. These should be run on a copy of your database for testing purposes before trying to convert your live database. They are experimental but I have used them on several dozen databases without issue. However, they have always been English databases. These tools will only look at standard vB5 tables in their present configuration. If you have other tables, you'll have to manually change them.

Comment

Wayne,
I just finished running the scripts on a backup of our vbulletin database. A couple of questions:
1. A few tables remain as MyISAM. They include: tagcontent, phrase, language, impexerror, userfield and mybb_attachments. Is this OK or should I attempt to change the type?
2. See attached image. About 30 or so tables did not convert, they remain in latin1. Is this ok, or do I need to change these manually?

Comment

1. language, phrase and userfield are left as MyISAM because our minimum required version of MySQL is 5.5.8. You need to be running MySQL 5.7 to change them and we haven't tested those changes. tagcontent and mybb_attachments are not vBulletin 5 tables.

2. The tables that weren't converted in the image do not appear to be vBulletin 5 tables. The upgrade scripts should have removed most of these if your database user has permission to delete tables.

Comment

I just completed the conversion. Everything went smoothly. Two quick follow-up questions:

1. Do I need to anything with search, such as re-index, or will that sort itself out over time?

2. I seem to have 20 or so tables in the database that don't belong, at least they didnt convert to utf8mb4 and they little or no data within them. Many appear to be related to VB4. Do you have a list of all valid tables in VB5.x so I can compare the master list to what's inside my database?

Comment

We are using Sphinx. Can we rebuild the search index by going to Admin CP > Maintenance > General Update Tools > Rebuild Search Index, or do we need to do it via the command line with the searchindex.php file?

If we can go through the Admin CP, do we need to Empty the search index first or just click Rebuild the Search Index?

We process personal data about users of our site, through the use of cookies and other technologies, to deliver our services, personalize advertising, and to analyze site activity. We may share certain information about our users with our advertising and analytics partners. For additional details, refer to our Privacy Policy.

By clicking "I AGREE" below, you agree to our Privacy Policy and our personal data processing and cookie practices as described therein. You also consent to the transfer of your data to our servers in the United States, where data protection laws may be different from those in your country.