I am using rocksdb statebackend with incremental checkpoints as my check points size keeps growing i wanted to read and understand what is present inside my checkpoints.
In rocks db local dir, sst file are there. Thats is nothing but current operators state saved.sst file contains columnfamily names which are like table names and inside that we have key and value.
Following is from https://www.ververica.com/blog/manage-rocksdb-memory-size-apache-flink,
When used to store your Keyed state in Flink, the Key consists of the serialized bytes of the <Keygroup, Key, Namespace>, while the Value consists of the serialized bytes of your state.
Key group and key am able to find. What is namespace?
My flink job has tumbling window with aggregator. So i assume, in sst file value
is results of aggregated data in serialized form. Is this assumtion correct?.
Checked the coulmn families in the rocksdb folder using below command
ldb –db=/path/to/rocksdb/db list_column_families => the column families listed are {default, _timer_state/processing_window-timers, _timer_state/event_window-timers, window-contents}
When i tried to read an sst file using a rocksDb iterator, iterator is returning as invalid. why?
If any new key-value pairs is added using db.put(columnFamily, key) then iterator is becoming valid and able to read data from the .sst file. What could be the reason?
The .sst files generated are getting randomly deleted in between. What may be the reason for deletion of .sst files? Is it rocks db compression?
How the sst files size are mapped with checkpoints size in flink? my db folder is in few kbs but check point directory is in MB(de/serialization could add some data,anythis else?). Why?