I have large CSV files with stock values and I have successfully added these to my MariaDB database using the “LOAD DATA INFILE” command:
<code>LOAD DATA LOCAL
INFILE '/mnt/HD/HD_b2/Public2/raw_data/DEUIDXEUR_01.csv'
INTO TABLE raw_data_index_germany_at_eur
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ';'
(timestamp_, bid_price_, ask_price_, bid_volume_, ask_volume_);
</code>
<code>LOAD DATA LOCAL
INFILE '/mnt/HD/HD_b2/Public2/raw_data/DEUIDXEUR_01.csv'
INTO TABLE raw_data_index_germany_at_eur
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ';'
(timestamp_, bid_price_, ask_price_, bid_volume_, ask_volume_);
</code>
LOAD DATA LOCAL
INFILE '/mnt/HD/HD_b2/Public2/raw_data/DEUIDXEUR_01.csv'
INTO TABLE raw_data_index_germany_at_eur
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ';'
(timestamp_, bid_price_, ask_price_, bid_volume_, ask_volume_);
This works just fine for data files with around 3-4GB of data but any bigger and the query will start removing the X million rows that it just added and leave me with an empty table.
The csv’s syntax:
<code>2024-08-13 16:52:24.857|439.935|441.292|0.0120000001043081|0.0179999992251396;
2024-08-13 16:53:28.445|439.911|441.003|0.0120000001043081|0.0120000001043081;
2024-08-13 16:54:03.299|438.87|440.836|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:18.582|438.602|439.68|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:21.290|438.747|439.636|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:40.607|438.891|440.83|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:02.280|438.985|440.927|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:10.698|438.917|440.853|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:33.543|439.011|440.95|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:45.200|439.165|441.128|0.0120000001043081|0.0120000001043081;
2024-08-13 16:59:01.138|439.367|441.13|0.0120000001043081|0.0120000001043081;
</code>
<code>2024-08-13 16:52:24.857|439.935|441.292|0.0120000001043081|0.0179999992251396;
2024-08-13 16:53:28.445|439.911|441.003|0.0120000001043081|0.0120000001043081;
2024-08-13 16:54:03.299|438.87|440.836|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:18.582|438.602|439.68|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:21.290|438.747|439.636|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:40.607|438.891|440.83|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:02.280|438.985|440.927|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:10.698|438.917|440.853|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:33.543|439.011|440.95|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:45.200|439.165|441.128|0.0120000001043081|0.0120000001043081;
2024-08-13 16:59:01.138|439.367|441.13|0.0120000001043081|0.0120000001043081;
</code>
2024-08-13 16:52:24.857|439.935|441.292|0.0120000001043081|0.0179999992251396;
2024-08-13 16:53:28.445|439.911|441.003|0.0120000001043081|0.0120000001043081;
2024-08-13 16:54:03.299|438.87|440.836|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:18.582|438.602|439.68|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:21.290|438.747|439.636|0.0120000001043081|0.0120000001043081;
2024-08-13 16:56:40.607|438.891|440.83|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:02.280|438.985|440.927|0.0120000001043081|0.0120000001043081;
2024-08-13 16:57:10.698|438.917|440.853|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:33.543|439.011|440.95|0.0120000001043081|0.0120000001043081;
2024-08-13 16:58:45.200|439.165|441.128|0.0120000001043081|0.0120000001043081;
2024-08-13 16:59:01.138|439.367|441.13|0.0120000001043081|0.0120000001043081;
What am I missing? If the buffer size is the culprit, shouldn’t MariaDB take care of the import and do it in chunks? Or should I somehow address this prior to the query?
Table engine is InnoDB btw.