I’m working with PySpark within Synapse notebooks and I need to load a Parquet file into a DataFrame, apply some transformations (e.g., renaming columns), and then save the modified DataFrame back to the same location, overwriting the original file. However, when I try to save the DataFrame, it creates a directory with the name Test.parquet containing two files: one named SUCCESS and another with a string of random letters and numbers.
Here’s the code I am using:
%%pyspark
df = spark.read.load('path/to/Test.parquet', format='parquet')
display(df.limit(10))
column_mapping = {
"FullName": "Full Name",
}
for old_col, new_col in column_mapping.items():
df = df.withColumnRenamed(old_col, new_col)
display(df.limit(10))
df.write.parquet('path/to/Test.parquet', mode='overwrite')
Here is how is overwrites the file:
How can I correctly overwrite the original Parquet file without creating additional files or directories? Any help is appreciated!