Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

I can store and display unicode characters such as the micron (µ) Omega (Ω) and em dash (—) in the database by importing them from csv files.

However, as soon as I started to try out some queries using unicode characters I hit a problem. Basically, entering a "more than 7 bit" unicode character in a quoted string seems to hide the closing quote mark from the mysql client, which then stays stuck in quote mode. Simple example below-

MariaDB [uom]> -- Using plain ASCII- this works
MariaDB [uom]> select replace('cm', 'c', 'm');
+-------------------------+
| replace('cm', 'c', 'm') |
+-------------------------+
| mm |
+-------------------------+
1 row in set (0.000 sec)
MariaDB [uom]> -- Try using a unicode character in the query, this fails
MariaDB [uom]> select replace('µm', 'µ', 'u');
'>
'>
'> '
-> ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1

Interestingly, if I run the same query non-interactively it executes, but the output seems to indicate something in the mysql command line path is not using unicode as the unicode characters are converted to "?" on output-

I'm clearly missing a piece of the puzzle here. Any ideas much appreciated.

EDIT

I'm working on a non-production db, so I was able (I thought) to set everything top to bottom to utf8mb4 and re-import my data. Still seeing oddities-

The system is ignoring the setting for character_set_server in my.cnf. Even though I have the line-

character_set_server = utf8mb4

in my.cnf, the system still reports it is set to "utf8" after restarting the server. If I issue the commmand set character_set_server='utf8mb4'; interactively in the client, the variable changes value for the session, but this has no affect on the other issues.

The "micro" character, which should be hex B5 is stored as xC2B5 which is a chinese / Hangul character. With the server, client, and connection all set to utf8mb4, and the csv file i'm importing identified as latin1, I can;t see why double-encoding is happening.

The oddities with the myql client program (emitting "?" characters from the command below, even though it's set to run in utf8mb4-

1 Answer
1

When using the cmd prompt, set that window to use the code page chcp 65001:

The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. some code pages. To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console

I set the code page early on. I was using Consolas as the terminal font, but i've just re-tried with the font set to Lucida Console and there's no change. That link to the windows-specific info on Unicode support is new though. I'm going to check that out now.
– IncansFeb 18 at 0:27

I think I am seeing 2 separate problems. Importing the latin1 (cp1252) data from the text file is obviously the main one. The fact that the console output doesn't display unicode properly when the mysql client is run in "-e" execute mode, but function parameters with a character like "µ" DO work (whereas interactively it doesn't) seems to imply an issue between the mysql client and Windows console input/output. I tried the options recommended in the linked doc, but that made no difference. mysql --default-character-set=utf8mb4 -e "select replace('µm', 'µ', 'm')
– IncansFeb 18 at 0:52

Next step is to upgrade Mariadb to the latest stable version (10.3.12), i'm currently on v10.3.8
– IncansFeb 18 at 0:55