I’m having trouble configuring Karpenter to handle EC2 Spot Instance interruptions without causing downtime. My goal is to ensure that when a Spot Instance is interrupted, there is no impact on the availability of my applications.
Here’s the situation:
Current Behavior:
I simulate an interruption using AWS Fault Injection Simulator (FIS).
Pods on the interrupted instance immediately start terminating.
New pods are created but remain in a pending state while waiting for a new instance to become available.
This leads to downtime because the new pods stay pending until an instance is ready, and the old pods are terminated.
Desired Behavior:
New pods are created and stay pending while waiting for a new instance.
The system should wait until the new pods are running (or until a timeout is reached) before terminating the old pods on the interrupted instance.
I’m not sure how to configure Karpenter to achieve this behavior. Are there specific settings or adjustments I need to make? Any guidance or tips would be greatly appreciated!
I also tested Pod Disruption Budgets (PDBs), and although the pods stayed running, it didn’t trigger the creation of a new instance on a new machine.
Thanks in advance!