I am writing an iterative algorithm and I am using localCheckpoint
to break the dataframe lineage as well as to persist the data for next iteration. The problem is unpersist
doesn’t seem to work on local checkpointed dataframe, and thus the local checkpointed dataframe is never cleared (the storage UI also confirms that).
The following code illustrates the iterate algorithm:
result = empty_dataframe()
while condition:
df = load_partition(dt)
old_result = result
result = process_partition(old_result, df).localCheckpoint()
old_result.unpersist() # doesn't seem to work as intented
I believe unpersist
only works for cache
and persist
but I didn’t find any resources on how to clear local checkpointed dataframe.