The below code we running the job incrementally every 10 mins in Azure Synapse pipeline(inserting 3k to 4k records for every run) taking more time to insert the records into the table. we are not able to perform partition on the table. how we can insert the records in lessthan 5 mins to target table.
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true")
numberof_partitions = 10
col_data = employeedata.coalesce(numberof_partitions)
col_data.write .format("delta") .mode("append").saveAsTable(f"`{database_name}`.`{table_name}`")
spark.sql(f"OPTIMIZE {database_name}.{table_name}")
Cluster configuration:
Small (4 vCores / 32 GB) – 3 to 5 nodes
Allocated vCores 12
Allocated memory 96GB