Flink job crashing due to OutOfMemoryError after ZooKeeper leadership change
I recently introduced ZooKeeper to my Flink Job for Checkpointing and High Availability. Things were running smoothly until a ZooKeeper leadership changed and the JobManager disconnected. The pattern I noticed is that the JobManager continued to try to reconnect to ZK, and eventually did when the new leader was elected. However, ~1 minute later, both the Job Manager + Task Manager were experiencing OutOfMemoryError and the Flink job crashed.