I have implemented a system where requests are processed asynchronously using an Azure Service Bus. These requests initiate an operation that, once complete, returns a response to the initiating actor. My objective is to manage the flow of requests to OpenAI efficiently, particularly by limiting the number of concurrent requests to 8 to prevent token bottlenecks.
I have configured my host.json as follows to achieve this limit:
{
"extensions": {
"serviceBus": {
"batchOptions": {
"maxMessageCount": 1,
"autoComplete": true
},
"prefetchCount": 8,
"messageHandlerOptions": {
"autoComplete": true,
"maxConcurrentCalls": 8
}
}
}
}
Additionally, I have set the environment variables
WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT and FUNCTIONS_WORKER_PROCESS_COUNT to 1.
Despite these configurations and referencing Azure Functions, the Service Bus queue often has many messages, but the actual number of concurrent requests remains low (typically 1 or 2).
Why might the Azure Service Bus Trigger Function not be achieving the expected level of concurrency given these settings? What changes should I consider to ensure the number of concurrent requests consistently reaches the limit of 8?