Relative Content

Tag Archive for python-asynciofastapiopenai-apilarge-language-model

Enforcing tokens per minute rate limit in FastAPI

I’m developing FastAPI endpoints that rely on calls to OpenAI’s chat completion. Some of the calls use asyncio.gather to make concurrent requests to OpenAI. Because of this, I’m worried about rate limits. I’m currently using Semaphore to limit the requests per minute, but I’m not sure how to handle tokens per minute.