I am trying to persist data into a delta table using a scheduled notebook which reads daily from an Postgres DB. I want to make sure no duplicates are stored so I perform a merge operation with new data:
delta_table = DeltaTable.forPath(spark, delta_table_path)
(delta_table
.alias("target")
.merge(
remote_df.alias("source"),
"target.time = source.time AND target.devicename = source.devicename") # Define the unique key condition here
.whenMatchedUpdateAll() # Update all values when the condition matches
.whenNotMatchedInsertAll() # Insert new rows when there is no match
.execute()
)
I created an external location in S3 (s3://databricks-managed-location) beforehand und made sure that the catalog and schema are created accordingly
create_catalog_sql = f"CREATE CATALOG IF NOT EXISTS catalog123 MANAGED LOCATION 's3://databricks-managed-location/catalog123';"
create_schema_sql = f"CREATE SCHEMA IF NOT EXISTS catalog123.myschema MANAGED LOCATION 's3://databricks-managed-location/catalog123';"
The table is initially created via
remote_df.write.mode("overwrite").saveAsTable(f"catalog123.myschema.mytable")
The error I receive is
Input path url 's3://databricks-managed-location/project123' overlaps with managed storage within 'CheckPathAccess' call. .