First, let us have a look at a pod with a “real” nslookup
utility:
mark@L-R910LPKW:~$ k -n chip exec deployments/toolbox -- sh -c 'ls -l $(which nslookup) ; nslookup default-redis-redis.system-d-kb-redis.svc'
-rwxr-xr-x 1 root root 100600 Feb 13 2024 /usr/bin/nslookup
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: default-redis-redis.system-d-kb-redis.svc.cluster.local
Address: 10.0.6.5
mark@L-R910LPKW:~$
Works as expected.
Now a pod based on the busybox:
mark@L-R910LPKW:~$ k exec deployments/redis-insights-redisinsight -- sh -c 'ls -l $(which nslookup) ; nslookup default-redis-redis.system-d-kb-redis.svc'
lrwxrwxrwx 1 root root 12 Jun 18 14:16 /usr/bin/nslookup -> /bin/busybox
Server: 10.0.0.10
Address: 10.0.0.10:53
** server can't find default-redis-redis.system-d-kb-redis.svc: NXDOMAIN
** server can't find default-redis-redis.system-d-kb-redis.svc: NXDOMAIN
command terminated with exit code 1
mark@L-R910LPKW:~$
Not sure what to make out of it.
The second pod uses the following image:
mark@L-R910LPKW:~$ k get deployments/redis-insights-redisinsight -o yaml | yq .spec.template.spec.containers[].image
redislabs/redisinsight:2.54.0@sha256:938c50cf95f7389bc93ce4d26e6eed6855736a8e5b5b05f7e640f01d1be2d514
mark@L-R910LPKW:~$
I deploy it using the HELM chart provided by https://truecharts.org
This is not a mere question of nslookup utility. The DNS resolution just does not work as expected on that pod. When the app running there tries to resolve the address of default-redis-redis.system-d-kb-redis.svc
it gets back the getaddrinfo ENOTFOUND error.
5
So I tried to reproduce it with the latest busybox
image and I could not. I could not even reproduce it with the redis insights image in question (redislabs/redisinsight:2.54.0
) as is. However, it does reproduce with the redis insights deployment based on that same image, but originating from the respective HELM chart.
It turns out the deployment contains the following at .spec.template.spec
:
dnsConfig:
options:
- name: ndots
value: "1"
I am going to search what this is, but it is the root cause. Once I removed it, the .cluster.local
suffix problem disappears.