I have a celery work and flower running on a container group on Azure connected to an Azure Redis cache. It works great for a while, maybe a couple hours, and then suddenly all of the workers go offline. The celery workers show in the logs that they missed the heartbeat of the other workers and the flower container shows:
Error 8 while writing to socket. EOF occurred in violation of protocol.
which appears to be an error with Redis. Any idea why this might be happening? The containers don’t appear to be asleep, but maybe the long-lived connection to redis is being dropped. If so, I don’t understand why they don’t just reconnect.
If I restart the container group, it all comes back up, but I don’t want to have to do that every few hours.