We have a system that runs on AKS (Azure Kubernetes Services) where one of our deployment hosts Python code that connects to Snowflake using the following connection:
import os
import snowflake.connector
def create_snowflake_connection(account, access_token, warehouse, database, user, role):
conn = snowflake.connector.connect(
user=user,
host=os.getenv('SNOWFLAKE_ACCOUNT'),
token=access_token,
role=role,
account=account,
warehouse=warehouse,
database=database,
authenticator='oauth',
client_session_keep_alive=True,
max_connection_pool=100,
)
return conn
Now when we call that deployment to run a set of queries (function_name); it returns a Server Error 504 Gateway Time-out for url https://url-of-deployment.com/endpoint/function_name
However, when the same python code is run directly from the machine of the user; without using kubernetes, the code works just fine.
So, we tried to troubleshoot the issue and see where is the cause; from both kubernetes and snowflake connection configurations.
1. Snowflake connection troubleshoot results:
1.a Snowflake Connector Version:
snowflake-connector-python==3.12.0
1.b Increase Snowflake Connection Timeout Parameters:
We tried to increase these parameters using this documentation below:
Managing Snowflake Connection Timeout
like the following
import os
import snowflake.connector
def create_snowflake_connection(account, access_token, warehouse, database, user, role):
conn = snowflake.connector.connect(
user=user,
host=os.getenv('SNOWFLAKE_ACCOUNT'),
token=access_token,
role=role,
account=account,
warehouse=warehouse,
database=database,
authenticator='oauth',
client_session_keep_alive=True,
max_connection_pool=100,
login_timeout=300,
network_timeout=300,
socket_timeout=300
)
return conn
But still getting the same error when calling the endpoint. It seems actually that increasing the value of socket_timeout
in the python connector code does not actually affect the default value of DEFAULT_SOCKET_CONNECT_TIMEOUT
found in:
Python Snowflake Connector Network.py
according to this post:
Github Snowflake Timeout Issue Post
2. Kubernetes troubleshoot results:
So, we tried also to check the manifests of our deployment in kubernetes if it has some timeout config set to 60s
; this is because in most cases of the timeout 504 error that we were getting, the logs from kubernetes show that when awaiting_response time beyond 60 seconds; there is a timeout error returned.
2.a deployment manifest timeout value:
These are the timeout configs that are set in out deployment manifest:
.
.
.
readinessProbe:
initialDelaySeconds: 1
periodSeconds: 2
timeoutSeconds: 300 # This one was set to 60 seconds and increased to 300 seconds
successThreshold: 1
failureThreshold: 1
.
.
.
.
nginx.org/proxy-connect-timeout: 3600s
nginx.org/proxy-read-timeout: 3600s
nginx.org/proxy-send-timeout: 3600s
It is also worth noticing that we use an oauth2 reverse proxy pod for authenticating users requests sent to the previous deployment; users authenticate to the oauth2 reverse proxy (associated with an app registration); receive an access token, then they send their requests. However, in the manifest we did not set any timeout config value.
I found in kubernetes documentation this setting:
nginx.ingress.kubernetes.io/auth-keepalive-timeout
which defaults to 60 seconds (probably the cuase of our gateway timeout), but I do not understand how it is related to this setting:
nginx.ingress.kubernetes.io/auth-keepalive
the documentation says:
Defaults to 60 and only applied if auth-keepalive is set to higher than 0