I have a delta table saved in s3, and I’m using an aws glue job to read a set of csv’s into a pyspark dataframe, and then to update the delta table by appending the dataframe rows to the delta table. Before I can do that, I need to delete the rows in the delta table that have the same date as the dates that show up in the dataframe created from the csv.
I have tried the following, but it’s too slow:
for date in incremental_day_list:
bronze_df.delete(col("day") == date)
I have searched for sql commands but I don’t find any examples that can interpolate a list of strings. Also, I’m not sure if the result is saved in delta format in s3 if I run a select and filter query.
Any help would be much appreciated.