I’m struggling to detect the right encoding to insert an UTF8 CSV dataset to a database.
In my DB, all the text fields are created using utf8mb4_unicode_520_ci (this is how my WordPress is configured so I can’t really change that). So I assume it’s a kind of UTF8 encoding..
For all the fields I’m using this function. Without this function, all inserts had strange characters. Now all the fields look good.
$row_data[$key] = mb_convert_encoding($value, 'ISO-8859-1', 'UTF-8');
… except for two fields. These two fields are collected in the same CSV but from another source (another web site) so I think for some fields in the CSV, the encoding may be different.
Here is an example with a sample data that doesn’t want to be inserted into the DB.
<?php
$text = "Gergő Rácz";
// Détection de l'encodage
$encoding = mb_detect_encoding($text);
echo "encoding detected: " . $encoding;
$utf8_text = mb_convert_encoding($text, 'ISO-8859-1','UTF-8');
echo "ntext to UTF-8 : " . $utf8_text;
# php ./p.php
encoding detected: UTF-8
text to UTF-8 : Gerg▒? Rácz
It’s like it’s already UTF-8 but not really. And I can’t identify which encoding it is. Garbage characters in, garbage out.
Any idea ?
Many thanks !!