I have deployed a Spring Boot JAR application in Google Kubernetes Engine (GKE). I need to scale it based on the number of HTTP requests. Here are my constraints and requirements:
Constraints:
- I cannot use Horizontal Pod Autoscaler (HPA) based on CPU and memory because the CPU usage is very low, and my application cannot release heap memory effectively.
- I want to scale based on the current HTTP requests, not pending requests.
- Each pod should handle only 10 HTTP requests. If a pod is at capacity, the requests should be passed to other pods.
- If there aren’t enough pods to handle the incoming requests, new pods should be scaled up.
- I have no experience with managed Prometheus, and I want to avoid any additional costs.
Current Setup:
We currently use a group of VM instances running the JARs, scaling up and down based on HTTP request load.
Requirements:
Implement a minimal cost setup.
Scale the pods in GKE based on HTTP requests without using any paid tools.
I have used KEDA to scale HTTP requests, but it scales pods based on pending HTTP requests. I need it to scale based on the current requests. I have the default metrics server pre-installed on GKE.
Could you please help me correct this?
Atul Verma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.