I recently updated MySQL on several servers from 5.7 to 8. The delay was due to a number of other compatibility updates we had to do on websites/apps running on these servers. We were already using UTF-8 for everything and had no issues with foreign characters, etc.
But after the update, we noticed some usernames in some apps were mangled.
For example, a user Zoë was now Zoë
Looking into this further, I see that MySQL has somehow “double encoded” the previous 2-byte encodings.
In MySQL 5.7, the ë in Zoë was stored as 2 bytes, c3 ab (in ASCII this is ë)
But now, MySQL 8 has taken EACH of those bytes and re-encoded it as a separate character!
so, it encoded à as c3 83, and it encoded « as c2 ab,
and now I’m left with ë instead of ë
I’m using PHP 8.1.28 and php.ini has default_charset=UTF-8
And for MySQL, here is some output:
> show variables like 'char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)
Does anyone know why this happened or how I can fix it?