We’re experiencing unexpected auto-scaling events triggered by the Horizontal Pod Autoscaler (HPA) during deployments in Kubernetes.
The Problem
Using the RollingUpdate
deployment strategy, Kubernetes creates a new pod before terminating an old one. This temporarily increases currentReplicas
, affecting the HPA formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
Example
Suppose I have two pods that are serving well the current demand. I update the version of the image in their deployment, so it starts a new pod with the new image. In this case, the values for the formula will be:
currentReplicas = 3
(includes new pod)currentMetricValue = 6430859672
desiredMetricValue = 9172000000
But I do not need a third pod for the current demand, thus resulting in an undesired desiredReplicas = 3
.
Switching to the Recreate
strategy avoids this issue but we cannot afford the associated downtime.
Question
What’s the best way to prevent HPA from scaling up during deployments? Are there any widely adopted solutions or workarounds, such as modifying HPA behavior, or temporarily disabling scaling?
We thought about using flags like --horizontal-pod-autoscaler-initial-readiness-delay
, but we understood from the Kubernetes documentation it only affects the CPU resource metrics collection.
Context
- HPA is based on memory metrics (
prometheus.googleapis.com/jvm_memory_used_bytes/gauge
). - Deployments use
RollingUpdate
.
Any advice or recommendations would be greatly appreciated!