I’m using the open telemetry prometheus receiver to scrape pods via service discovery. So far I haven’t seen any of my desired metrics come through to my sink.
My OTEL collector is configured to scrape metrics using the prometheus receiver like the following:
otelDeployment:
...
config:
receivers:
...
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-qdrant-pod-metrics'
scrape_interval: 1s
scrape_timeout: 1s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_x_prometheus_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_x_prometheus_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_x_prometheus_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_x_prometheus_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::d+)?;(d+)
replacement: $$1:$$2
...
service:
...
pipelines:
...
metrics:
receivers: [ otlp, prometheus ]
processors: [ resource, batch ]
exporters: [ otlp ]
...
My pod has the appropriate labels corresponding to the above. I know this because when I use promtool check service-discovery
with the above config (just the prometheus part of it) in a pod that approximates the collector (meaning it has the same service account), it is able to do service discovery exactly as intended.
An example of one of what promtool
returns for a given matching pod is the following:
{
"__address__": "10.244.20.145:6333",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"__scrape_interval__": "1s",
"__scrape_timeout__": "1s",
"instance": "10.244.20.145:6333",
"job": "kubernetes-qdrant-pod-metrics",
}
If you stitch together the address, path, and scheme together for this one, you get:
# curl https://10.244.20.145:6333/metrics
# HELP app_info information about qdrant server
# TYPE app_info counter
app_info{name="qdrant",version="1.10.1"} 1
...
So the scrape rules seem to be configured correctly.
Additionally, as you can see above, I’ve added the prometheus receiver into the list of metrics pipeline receivers.
One way of debugging this could be to increase the log level of the collector in case there are some issues hiding there that can’t be seen. I haven’t been able to figure out how to do that yet either though.