I have a delta table with the following properties defined.
delta.enableChangeDataFeed: "true"
delta.columnMapping.mode: "name"
delta.autoOptimize.optimizeWrite: "true"
delta.columnMapping.maxColumnId: "11"
Since delta.logRetentionDuration
has not been explicitly defined my assumption has been it should be the default value of 30 days as stated here.
When I run a DESCRIBE HISTORY
query on my delta table I get the following result:
version | timestamp | … |
---|---|---|
555 | 2024-08-01T23:08:23.000+00:00 | … |
554 | 2024-08-01T11:50:53.000+00:00 | … |
… | .. | … |
520 | 2024-06-26T23:04:20.000+00:00 | … |
519 | 2024-06-25T23:09:27.000+00:00 | … |
This seems odd to me since it is keeping table history for 37 days and not the expected 30 days.
I’ve been monitoring this history table for the last couple of days as new version are appended, but no old entries has been cleared. The high version number suggest that old records are cleared at some point but I don’t understand when that happens.
For how long is delta table history kept when delta.logRetentionDuration
is not set?
Either the documentation is wrong or I’m missing something.
I suspect it’s the latter.
I’m using databricks runtime 10.3 and scala 2.12.