I’m using Flink 1.18.1 with Flink operator 1.7.
The startup time (from pod creation to RUNNING
state) is around 3 minutes. And if I have 2 JM and the leader is killed/restarts, the job take around 1:45 minutes to start.
This was quite okay so far, but I’m running a somewhat low latency job that requires this time to be snapier. Is there something around improving start time for Flink deployments?
What I use today:
- Standalone mode
- Kafka as source
- HA k8s enabled
- GCS as external storage system (checkpoints and savepoints)
- k8s Flink operator
I don’t seem to find any struggle on the logs, though.
Checked the logs of the application but nothing got my attention.
1