I’m having issues with scaling my kubernetes cluster properly with huge number of websocket connections.
So before i moved to kubernetes i was using a t2.medium aws ec2 instance for my server,held up for a while till i started getting huge number of traffic,websocket connections to be precise,so for temporary fix i upgraded the instance to a t2.large which is currently handling the traffic but with the current growth in users i know the new instance won’t be able to hold it off in the next 4-6 weeks,so i decided i will just go with kubernetes for horizontal autoscaling.
I have set the kubernetes up via aws EKS,all the needed worker nodes and pods, prometheus and grafana for metrics monitoring, cluster autoscaler for scaling my worker nodes and HPA for scaling my pod replicas.
But while load testing i ran into an issue,even though the cpu and memory utilization threshold ( which i set to 500m for cpu and 500Mi for memory per pod ) wasn’t exceeded at that traffic level all the websockets where getting disconnected immediately a message is sent via the connection, a similar thing happened when on the t2.large and it ended up being a file descriptor limit on ubuntu which i had to increase for the worker processes of my web server,but i can’t find a way to do this on kubernetes cause first off i can’t ssh into any of the ec2 instances cluster autoscaler spins up and i can’t do OS level configurations with the pod. via grafana my memory utilization is constant at 4/16 Gi and only a little above 2% of cpu utilization meaning even the existing 6 replica pods are underutilized and still can’t handle the connections.
I might be wrong about the possible solution in this case which is why i’m asking people experienced with kubernetes cause my assumption is based off my experience with working with one ec2 instance,but i will still love to know if there’s a way that i can increase the file descriptors limit in aws eks kubernetes cluster.
here’s my pod yaml configuration :
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-deployment
spec:
replicas: 6
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: web-app-image:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
env:
- name: REDIS_HOST
value: "redis-stack-0.redis"
- name: REDIS_PORT
value: "6379"
- name: CELERY_BROKER_URL
value: "redis://redis-stack-0.redis:6379/0"
resources:
requests:
cpu: "500m"
memory: "500Mi"
---
apiVersion: v1
kind: Service
metadata:
name: web-app-service
spec:
type: LoadBalancer
selector:
app: web-app
ports:
- protocol: TCP
port: 80
targetPort: 8000
and here’s my HPA yaml configurations :
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-deployment
minReplicas: 6
maxReplicas: 12
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
so in summary i’m asking two questions, first is if i can increase file descriptor limit and the second is based off your experience with kubernetes what other possible things could be wrong cause based off my little experience there’s high chance that my assumption is wrong.