I have a Docker Swarm cluster of 1 manager and 1 worker node so far. I am trying to setup node-exporter with prometheus. I initialized it all with the swarm-monitoring stack template on Portainer which has this as the Docker compose stack file (I’ll just give Prometheus and node-exporter services).
version: "3.8"
services:
prometheus:
image: prom/prometheus:latest
command:
- '--config.file=/etc/prometheus/prometheus.yml' #Specifies the location of the Prometheus configuration file within the container.
- '--log.level=error'
- '--storage.tsdb.path=/prometheus' #Designates /prometheus as the directory where Prometheus stores its time series data inside the container.
- '--storage.tsdb.retention.time=7d' #How long prometheus will keep the data for, data before 7d is deleted.
- "--web.enable-lifecycle" #Enables hot reload by sending a curl request to http://<Node-IP>:9090/-/reload. No need to restart service.
ports:
- "9090:9090"
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
- node.labels.monitoring == true
volumes:
- prometheus-data:/prometheus
- /root/prometheus.yml:/etc/prometheus/prometheus.yml
networks:
- net
- monitor-net
node-exporter:
image: prom/node-exporter:v1.5.0
command:
- '--path.sysfs=/host/sys'
- '--path.procfs=/host/proc'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
- '--no-collector.ipvs'
deploy:
mode: global
resources:
limits:
memory: 128M
reservations:
memory: 64M
ports:
- target: 9100
published: 9100
protocol: tcp
mode: host
volumes:
- type: bind
source: /
target: /rootfs
read_only: true
- type: bind
source: /proc
target: /host/proc
read_only: true
- type: bind
source: /sys
target: /host/sys
read_only: true
networks:
- net
- monitor-net
volumes:
prometheus-data:
networks:
net:
driver: overlay
monitor-net:
external: true
And here is the prometheus.yml file I was using
global:
scrape_interval: 4s
evaluation_interval: 60s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
dns_sd_configs:
- names:
- 'tasks.cadvisor'
type: 'A'
port: 8080
- job_name: 'node-exporter'
dns_sd_configs:
- names:
- 'tasks.node-exporter'
type: 'A'
port: 9100
Now the problem is, in the Targets section of Prometheus, there is only one instance of Node-Exporter that is running on Manager node. However, when I do
docker service ps cluster-monitor_node-exporter
I can confirm that this service is indeed running on 2 nodes.
Now instead of relying on DNS service discovery in prometheus.yml I can just manually type in the static targets of each node and that can work.
However, I need to know why Service Discovery in Docker Swarm is not working. I have Googled around apparently this is a known issue. Is there any reliable workaround for making service discovery easy?
I have tried tried a lot of things. I have allowed the ports
2377/tcp The default Swarm control plane port, is configurable with docker swarm join –listen-addr
4789/udp The default overlay traffic port, configurable with docker swarm init –data-path-addr
7946/tcp, 7946/udp Used for communication among nodes, not configurable
to be open.
My network monitor-net is an overlay network. I have made it attachable (though I know that is only useful for attaching standalone containers and not tasks from services).
I have tried a variety of ways and can confirm that DNS Resolution Service Discovery is an issue, which is very famous for it. I need to know a workaround.
Any help would be appreciated here. Thanks in advance.
Shehroz Khan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.