How are folks handling hard deletes in their Delta Live Table pipelines? I am working with the source team to see about getting them to update their processes to provide a change log but for right now, that option is off the table and I need to come up with a way to handle hard deletes in my sources at the apply changes step.
Our process is using autoloader to build out bronze tables and then using apply_changes to bring each of these into an SCD-1 or SCD-2 table at the silver level.
Some ideas we are considering are below. In all cases we would be doing a full primary key pull each day in addition to our incremental load of the data.
-
Using the primary key list, I could work out what has been deleted
after the call to apply_changes. -
Using the primary key list, I could modify the bronze table after
creation (or potentially during creation) to mark applicable records
as deleted. -
After the DLT pipeline completes, I could use the primary key list
to update the apply_changes DLT table with the deletes as a second
transaction.