In my S3, I have a CSV file that looks like this:
"id","created_at","subject","description","priority","status","recipient",
Now, one of the description
column’s row looks like this:
"
Internet Marketing Experts
Note: If you are not interested, then you can reply with a simple "NO",We will never contact you again.
"
This has normal text but also some special characters. Notice this part in the cell’s value:
"NO",We
The first comma (,
) after interested
does not cause an issue. But the second comma "NO",We
causes a split and We
goes into the next column (priority
) when loaded into MySQL. Even though it should be a part of the same column (description
).
Now I use this to load data from S3 into MySQL:
f"""LOAD DATA FROM S3 's3://{s3_bucket_name}/{s3_key}'
REPLACE
INTO TABLE {schema_name}.stg_tickets
CHARACTER SET utf8mb4
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
IGNORE 1 LINES
(id, created_at, subject,
description, priority, status, recipient)
"""
This works for all other cases because there are not a lot of commas/special characters in other description
rows. But this particular record messes up the columns because it cannot distinguish the content properly.
I want to handle it using the MySQL command itself. Pre-processing the S3 files wouldn’t be an option at the moment.
Note: I am running the query using Python.
I have already tried ESCAPED BY
but although it doesn’t throw an error, it doesn’t make any difference on the data split either:
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
What else could I try?
4