I need to execute two commands into an Aurora MySQL database, which has a connection in glue already in place. The first command is TRUNCATE TABLE and the second LOAD DATA FROM S3 into a table. I know I could easily do this with a Lambda function in Python but the timeout limit is not enough (15 min), since the data I need to load is an 11GB text file. Also, I’ve read that i could perform the truncate table as a preaction.
I don’t think this needs a source and a target node like any regular ETL job, if only I could run a SQL command using an existing connection.
I’ve built a visual ETL job that works fine, it has a couple of transformations that I need to do, which were not needed if I could run the LOAD FROM S3 command. Also, I suspect it would take less time (currently 7 hours).
Consider using PyMySQL to execute the LOAD DATA FROM S3
operation. You can import this library into your AWS Glue jobs.
1