I am deploying a Django app with Celery workers on AWS EKS. I have everything running as expected, except that K8s proceeds to stopping Celery workers replicas before finishing ongoing tasks, I also have the same behavior when making a new deployment or pushing new code to the master branch.
What I have tried:
- Setting a very large grace period, this solution didn’t work, because we’ve got tasks that runs for hours.
- Setting a preStop hook, this solution didn’t also work since K8s doesn’t wait for the hook to finish if it exceeds the grace period.
- I have also tried fixed replicas count, but it’s obviously not a solution.
more information:
I have celery setup with Redis as a messaging broker and a result backend. After some research I started considering using Keda, but upon reading the docs, seems like it will only allow me to scale Celery pods based on queues length but doesn’t give the kill mechanism I am looking for.
Is there any workaround to solve this issue?