When calling services in my EKS kubernetes cluster through the domain name, SSL Handshake take upwards of 9seconds on the average.
SSL Certificates are managed by cert-manager on the cluster. The ingress controller is nginx which to which calls from AWS loadbalancer routes calls going to the cluster. The ingress configuration is set to use the certificate managed by the cert-manager. The CA is LetsEncrypt.
Removing TLS configuration from the ingress configuration, calls to the cluster take less than a second.
My biggest suspicion is the issue occurs when the ingress controller is retrieving the certificate from etcd, but I don’t have a proof of that or how to measure.
I really do not know what to check again.
I have tried checking the ingress controller logs for any indication of any issues regarding SSL negotiation. What I notice in the ingress controller logs is that calls only appear on the logs after the SSL negotiation has finished, and the total time call time for a log recorded by the controller is usually in milliseconds. But the experience on the client is >9seconds.
I have checked other infrastructure metrics. I have overprovisioned the cluster resources: node memories are at 10% usage, CPU resources is %10 usage, storage usage on the nodes are <10% usage.
Intra cluster networking is what I have not been able to get eyes on.
My biggest suspicion is the issue occurs when the ingress controller is retrieving the certificate from etcd, but I don’t have a proof of that or how to measure.