Let’s say one of your consumers is unresponsive and it exceeds its connect timeout because it got stuck doing something, there will be a rebalance and the partitions that were assigned to the stuck consumer will be assigned to other consumers and the message that was never acknowledged will also be given to another consumer.
Now, the question, in this scenario let’s say that the stuck consumer in suddenly back and continues processing where it left which could be problematic because the message is being processed by another consumer at the same time which for our use case is very problematic.
What strategies do you follow in this case? I was thinking about using some kind of table to track execution status between consumers but that also has a lot of challenges and it’s own problems.