I have a DLT pipeline in Databricks, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only.
The pipeline runs successfully on the first run. However on the second run it fails:
org.apache.spark.sql.streaming.StreamingQueryException:
[STREAM_FAILED] Query [id = 48f8dad4-1ae6-4203-9bd1-bcda239db9c3,
runId = 023d9d7f-33e0-4301-ae39-5c041a392ea5] terminated with
exception: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update
(for example
part-00000-4ad8ffe0-5732-406e-b1b1-fd76107ab0a4-c000.snappy.parquet)
in the source table at version 26. This is currently not supported. If
you’d like to ignore updates, set the option ‘skipChangeCommits’ to
‘true’. If you would like the data update to be reflected, please
restart this query with a fresh checkpoint directory. The source table
can be found at path abfss://sustainability
Setting the skipChangeCommits
flag to true, doesn’t work – any changes in the kpi table are simply ignored and kpi_historized remains unchanged. It seems that any streaming table (append-only) in DLT requires a streaming source – but none of the other tables in the DLT pipeline need to be append-only. I do not wish to change the logic in all upstream tables just so that final table can be append-only.
All I am trying to do is have an append-only table at the very end of a DLT pipeline, and only at the end.