Recently (in the last month) I have been getting hangs in the API and checking the logs I am seeing this:
System.Data.SqlClient.SqlException (0x80131904): Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement
- I have set the connection pool to 3k and timeout to 60s.
- The API is an ASP.NET Core 6 Web API deployed on an Azure App Service
- The API uses EF Core to connect to the database
- The database is an Azure SQL database using the Standard tier (DTU based purchasing model)
- Nothing seems wrong database side when you check the metrics everything is consistently very low.
- There are not long running queries (3 second is the longest for a very specific call)
- Database is configured to allow networked Azure apps to access it (firewall) and this works fine most of the time
Semi regularly the front end hangs and a single call can take a few minutes to return (that would usually take < 1 second). Upon checking the logs I see the error above.
Note: The database is responsive when this error is thrown from the deployed API and there are not that many connections seen using sp_who2
. I have 2 APIs pointing to this same database and both experience this hang/timeout when whatever event it is occurs.
From what I gather the request is never leaving the client (API), could this be an Azure networking issue? If not this, anyone have an idea on why this is happening?
Verify that AutoClose isn’t set for the database, which can cause the first connections to time out waiting for it to wake back up. Similar for serverless/dedicated pools that can auto-pause (or can be explicitly paused programmatically).
Could also be this:
When you connect to an Azure SQL Database, idle connections may be terminated by a network component (such as a firewall) after a period of inactivity. There are two types of idle connections, in this context:
– Idle at the TCP layer, where connections can be dropped by any number of network devices.
– Idle by the Azure SQL Gateway, where TCP keepalive messages might be occurring (which makes the connection not idle from a TCP perspective), but not had an active query in 30 minutes. In this scenario, the Gateway will determine that the TDS connection is idle at 30 minutes and terminates the connection.
Could also be something like this:
After about a 2-5~ minute interval the app will go to “sleep” and require a visit to the function app site to wake it up. This takes about 30s to 90s and does not work 100% of the time.
testing-malady is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.