I am running a kubernetes cluster where part of the function is to transcode videos. I accomplish it in the following way: there is an api server running that sends messages through RabbitMQ whenever a video file on s3 needs to be transcoded, the messages are consumed by worker pods running on c5.2xlarge instances. Each such instance has exactly one transcoding node together with a bunch of daemonset pods such as cloudwatch and s3 mount.
I want to automatically scale the number of transcoding nodes based on the workload; to do this, I came up with the following strategy:
a) Track when a pod is idle for >10 mins
b) Deprovision idle pods and scale down deployment
c) Deprovision pods with <30% resource utilization for >10 mins
d) Make sure no pods other than transcoder and daemonsets run on transcoding instances so that (c) always works
e) track number of messages in the queue and scale up deployment when it reaches certain breakpoint
f) make node autoscaler provision more nodes when transcoding pods are pending
My problem is with (a) and (b) specifically. What is the proper way of tracking when a pod is idle (waiting to receive message from a queue) and safely deprovision it while making sure that it does not consume a message in the middle of deprovisioning and make the message go unfulfilled.