I have a stateful Spark SS (version 3.3.1) application that processes input events in pattern as the image shown.
And I use RoksDBStateStoreProvider to maintain state in memory and disk. There are about 400M of rows in the state, and the total size is about 6G. Here are the configuration for RocksDB:
"spark.sql.streaming.stateStore.providerClass": "org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider",
"spark.sql.streaming.stateStore.rocksdb.compactOnCommit": true,
"spark.sql.streaming.stateStore.rocksdb.blockCacheSizeMB": 3
This app runs pretty well in workdays, including peak hours. The CPU/Mem/Disk usage are all good. But, during weekend, when the data volume drops, I can see disk usage continually increase, until the executors are dead.
I login and find out the disk is full of RocksDB WAL(?) logs:
My questions are:
- #1. What’s the purpose of those log files?
- #2. Why the disk usage is stable and low in peak hours, but increase when data volume is very low?
- #3. How to mitigate this issue? (I’m stuck with Spark 3.3.1 which provides very few configurations about RocksDB)
Thanks!