I have deployed a sentence transformer model for semantic check in an aws server which is under load balancer and in autoscaling group. I am running this server using gunicorn command with uvicorn as worker class with timeout as 3000 seconds. Along with that I have given max requests to 1000 and jitters as 50. But the server is giving response as 504 when trying to access this server to while accessing from application server. I have defined the sentence transformer model at the root of api(fastapi) call. What could be the reason to receive 504 error response from the upstream server?
I have tried to load model at the root of my api file. Then the system which used to hang earlier is not hanging now. But the 504 error still persists
ruchika raichur is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1