I can do the following which performs a merge on the entire target table. Such as;
#target
target = DeltaTable.forPath(spark, target_path)
#source
src = updates
#merge
target.alias("target")
.merge(src.alias("source"), "source.id = target.id and source.cust_name = target.cust_name")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.whenNotMatchedBySourceDelete()
.execute()
But how can I perform a merge on a subset of the target such as; merge only for target.id=4 and other existing records in the table should not be impacted by this subset merge at all. Is there a way to achieve this?
from pyspark.sql.functions import col
#pseudocode - trying to filter a subset of target but the following will not work
target = DeltaTable.forPath(spark, target_path).filter(col('tbl_id').isin(4))
#source
src = updates
#merge
target.alias("target")
.merge(src.alias("source"), "source.id = target.id and source.cust_name = target.cust_name")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.whenNotMatchedBySourceDelete()
.execute()