We have a structured streaming spark job which reads the data from Kafka and write into Hbase. We use hdinsight cluster where our spark version 3.3.1.5.4.5.
In our use case, we have 1 Kafka topic with 3 partition and there are 51 streaming queries(different table) consume data from this topic . Each streaming query has different processing speed.
We have a check point location where we maintain only last 3 offset information only.
Normally after 1 hr our processing speed get reduce .
We have noticed that when we clear chekpoint or update the offset information for all tables as same in the chekpoint directory , even after 1 hr the query becoming faster.
Any help on this.