We have an Azure Api App with a continuous WebJob running a .net 8 WebJobs SDK app with a Service Bus triggered function. The function can run for prolonged periods and we need to handle Azure infrastructure updates when the underlaying VMs are rebooted properly.
According to https://github.com/projectkudu/kudu/wiki/WebJobs#soft-vs-hard-shutdowns we should have 3 minutes after our webjob is notified that it need to stop in the aforementioned case:
The VM the Web App lives on needs to be upgraded, and the Web App needs to be moved to a different VM and the shutdown time limit is 3 mins.
But we cannot directly simulate a scheduled maintenance so I thought we could at least test the following scenario from the same link above:
The App Service Plan’s instance count was reduced and some VMs need to
be removed and the shutdown time limit is 3 mins.
The test webjob’s MaxConcurrentCalls parameter is set to 1 so that there’s 1 service bus message per app service instance.
I’m manually scaling the app service plan to 2 instances, submitting 2 Azure Service Bus messages to trigger the function. The function run time is 15 minutes and the stopping_wait_time in the settings.job is set to 180 seconds.
After manually scaling the app service plan down to 1 instance the 2 function invocations continue to run until finished normally with no signals through the CancellationToken argument and no files created at the WEBJOBS_SHUTDOWN_FILE path.
If I stop the webjob from the Api Apps WebJobs section it does receive the cancellation signal and the stopping_wait_time value is respected.
Why the app service instances are not shut down on manual scale in after waiting for 10 minutes when the documented limit is 3 minutes?
How to I test that the 3 minute limit is described correctly in the docs for the scheduled manitenance/VM reboot case?