I am using Spark 2.4.5, Scala 2.11
I have a delta table set up on S3. In every run of my application, a new partition of the data is generated and appended.
df
.write
.format("delta")
.mode("append")
.save(deltaPath)
Once partition is appended, it also does:
val deltaTable = DeltaTable.forPath(deltaPath)
deltaTable.generate("symlink_format_manifest")
This symlink_format_manifest
takes around 20 minutes while the total job time is 28 minutes. I checked the generated files under _symlink_format_manifest/
and it seems that all of the older partitions get updated everytime. Confirmed this by checking last modified
of the manifest files of older partitions.
What do I need to change such that generate("symlink_format_manifest")
only is used to register a new partition and not reupdate all the previous ones everytime?