I have a metrics-server
deployment (v0.7.1
) running in EKS on Fargate runtime.
I noticed something rather strange: the metrics server fails to scrape kubelet (node) metrics on the fargate node it gets scheduled on.
The logs keep spitting these errors (IP addresses anonymised)_:
E0515 09:34:47.960043 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.1.2.3:10250/metrics/resource": dial tcp 10.1.2.3:10250: connect: connection refused" node="fargate-ip-10-1-2-3.ec2.internal"
The node indeed does exist and that’s the right IP to scrape:
fargate-ip-10-1-2-3.ec2.internal Ready <none> 24m v1.29.0-eks-680e576 10.1.2.3 <none> Amazon Linux 2 5.10.215-203.850.amzn2.x86_64 containerd://1.6.6
When I check the fargate node itself it does seem to indicate the port is exposed:
"addresses": [
{
"type": "InternalIP",
"address": "10.1.2.3"
},
{
"type": "InternalDNS",
"address": "ip-10-1-2-3.ec2.internal"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
When I delete the metrics-server
pod and it gets rescheduled on some other node in EKS the same thing happens: the node on which it gets scheduled it fails to be scraped.
Does anyone have any ideas what’s going on?