I’m experiencing a quite weird phenomenon –
I’ve set up a docker compose file on an EC2 machine, which includes my monitored containers, a cadvisor container and a cloudwatch agent container to forward logs & metrics to cloudwatch.
I’m integrating the metrics from cadvisor into the cw-agent using its built in StatsD engine – and setting it as a storage driver for my cadvisor. Here are the cadvisor & cw-agent snippets from the compose file –
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.45.0
container_name: cadvisor
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8081:8080"
command:
- "--storage_driver=statsd"
- "--storage_driver_host=cloudwatch-agent:8127"
- "--enable_metrics=memory"
- "--docker_only=true"
restart: always
privileged: true
networks:
- docker-network
cw-agent:
image: amazon/cloudwatch-agent:latest
container_name: cloudwatch-agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./aws-cw-agent/cw-agent-config-dev.json:/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json
- ./logs/app1:/app1-logs
- ./logs/app2:/app2-logs
- ./logs/app3:/app3-logs
ports:
- "8127:8127/udp"
restart: always
networks:
- ruby-network
in addition, here is the cloudwatch agent config file ive set up –
{
"agent": {
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/app1-logs/*.log",
"log_group_name": "/log-group/dev/app1",
"log_stream_name": "{instance_id}",
"timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
"retention_in_days": 365
},
{
"file_path": "/app2-logs/*.log",
"log_group_name": "/log-group/dev/app2",
"log_stream_name": "{instance_id}",
"timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
"retention_in_days": 365
},
{
"file_path": "/app3-logs/*.log",
"log_group_name": "/log-group/dev/app3",
"log_stream_name": "{instance_id}",
"timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
"retention_in_days": 365
}
]
}
}
},
"metrics": {
"namespace": "cAdvisor/dev-Containers",
"metrics_collected": {
"statsd": {
"metrics_collection_interval": 1,
"metrics_aggregation_interval": 1,
"service_address": ":8127"
}
}
}
}
The docker compose starts without any errors, but then, if I look at the cadvisor logs, I’d see the following log for every metric its trying to send –
E0729 16:14:17.857874 1 memory.go:94] failed to send data "cadvisor.app1-container.memory_usage:50401280|g": write udp 172.21.0.4:45489->172.21.0.2:8127: write: connection refused
The strangest part is that the issue goes away on its own 12 minutes after boot time, without any noticeable difference in the logs.
When everything runs properly, I’ll get the following log –
W0729 16:16:12.291433 1 machine_libipmctl.go:64] There are no NVM devices!
I also found this log in the cloudwatch log – Which is 8 minutes after the system booted up –
2024-07-29T16:14:19Z I! Started the statsd service on :8127
My question is – Did anyone encounter anything similar? And know of a way to remediate this overhead waiting time? Thanks in advance!
Dor Almog is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.