GPU Memory Issues Handling Multiple Simultaneous Requests with Flask-SocketIO, uWSGI, and Hugging Face Model
I am putting a chatbot into production using Flask-SocketIO integrated with uWSGI and gevent. The chatbot uses a Hugging Face model that occupies about 2GB of GPU memory.