I am trying to write a pyspark dataframe to a csv using the familiar
df.coalesce(1).write.csv(f"s3://{bucket-name}/{path}/", mode="overwrite", header=True)
The csv file starting with “part-000” is created correctly under {bucket-name}/{path}. When re-running the same script, the file is created and overwritten correctly under {bucket-name}/{path}. However, I am noticing that there is another object created called {bucket-name}_$folder$. Subsequent re-runs work correctly but the object with _$folder$ remains. If deleted, this unnecessary object gets created again.
Is this is a known issue seen when writing dataframes to csv directly ? Any workarounds or a “stable” alternate way available ?