I am trying to import a lot of data into a MariaDB table. I generated a large SQL script file which contains a lot of special characters (mainly µ, °, and NBSP).
I would like to run my script to put the data into SQL and preserve these characters. Each time I try to run the script, I either get “Incorrect string value” errors, or the characters appear as “?” when I query the data (See below). Is there a way that I can get this script to work and import these special characters?
I’ll provide an example sql script at the bottom and explain how I have tried to run it.
I tried running my script in the Linux terminal, mysql -e "source Test.sql"
. This stopped after the first error, so I’ll use examples from the mysql terminal since it will show all of the errors I encountered in my small example script.
Notice how I first tried to run my script file in the SQL terminal. I got an “Incorrect string value” for each of the characters mentioned above (xB0 = °, xB5 = µ, xA0 = NBSP). Running the script doesn’t work, however, when I copy/paste the script into my mysql session, it runs just fine. This gives me the desired output, but this would be cumbersome to do in my full-scale script.
Any idea how to get this to work by running the script?
#### Trying to run my script from SQL terminal
MariaDB [atlas]> source Test.sql
Query OK, 0 rows affected (0.017 sec)
Query OK, 0 rows affected (0.027 sec)
ERROR 1366 (22007) at line 12 in file: 'Test.sql': Incorrect string value: 'xB0)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 14 in file: 'Test.sql': Incorrect string value: 'xB5)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 16 in file: 'Test.sql': Incorrect string value: 'xA0' for column `atlas`.`TestSQL`.`comment` at row 1
#### Copying the lines from the script into SQL works...
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
Query OK, 1 row affected (0.005 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2020-02-04 13:48:04','This one has a mu (µ)');
Query OK, 1 row affected (0.004 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2022-06-12 16:07:50','This one has a line break >
'> And it contains a NBSP > ');
Query OK, 1 row affected (0.005 sec)
This is what I want the output to look like after I run the script
MariaDB [atlas]> select * from TestSQL;
+---------------------+--------------------------------------------------------+
| timestamp | comment |
+---------------------+--------------------------------------------------------+
| 2019-03-04 13:11:35 | This one contains a deg symbol (°) |
| 2020-02-04 13:48:04 | This one has a mu (µ) |
| 2022-06-12 16:07:50 | This one has a line break >
And it contains a NBSP > |
+---------------------+--------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]>
Here is my SQL script TestSQL.sql
to duplicate this issue.
DROP TABLE IF EXISTS TestSQL;
--SET CHARACTER SET utf8mb4;
CREATE TABLE IF NOT EXISTS TestSQL (
timestamp timestamp DEFAULT CURRENT_TIMESTAMP,
comment varchar(256) COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (timestamp)
) CHARACTER SET 'utf8mb4';
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2020-02-04 13:48:04','This one has a mu (µ)');
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2022-06-12 16:07:50','This one has a line break >
And it contains a NBSP > ');
Other things I’ve tried:
- In the script above, I used the line “SET CHARACTER SET utf8mb4;”. When the script is run with this uncommented, it seems to work, however it has question marks when I query the data:
MariaDB [atlas]> select * from TestSQL;
+---------------------+-------------------------------------------------------+
| timestamp | comment |
+---------------------+-------------------------------------------------------+
| 2019-03-04 13:11:35 | This one contains a deg symbol (?) |
| 2020-02-04 13:48:04 | This one has a mu (?) |
| 2022-06-12 16:07:50 | This one has a line break >
And it contains a NBSP >? |
+---------------------+-------------------------------------------------------+
3 rows in set (0.000 sec)
I get this same result when I use the command in the SQL terminal before running my script.
- I’ve tried messing around with the
/etc/my.cnf
file. I didn’t really see any changes here. I tried restarting the mariadb server just in case that had an effect too. Here is my config file for completion:
[client]
database=atlas
default-character-set=utf8mb4
[mysqld]
bind-address=0.0.0.0
max_allowed_packet=64M
collation-server = utf8mb4_unicode_ci
character-set-client=utf8mb4
character-set-server=utf8mb4
init-connect = 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
[client-server]
Here is the output of the questions asked in a similar post (How to fix “Incorrect string value” errors?):
MariaDB [atlas]> show variables like '%colla%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0.001 sec)
MariaDB [atlas]> show variables like '%charac%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mariadb/charsets/ |
+--------------------------+------------------------------+
8 rows in set (0.001 sec)