I’m currently scrapping all my deployment replicas (pods) metrics thanks to a ServiceMonitor
provided by Prometheus Operator. I’m able to display these metrics on a dashboard in Grafana as expected, but things get complicated when I want to aggregate them.
Let’s take for example a count metric for an API call, every pod exposes its own count. I’ve done something like this: sum by(job) (http_requests_total{handler="/endpoint"})
but as soon as one of the pods is scaled down/replaced, it creates a drop in the aggregated graph. So I assume this is not the way to go here.
So the question is simple, what is the usual way to aggregate metrics for a service that exposes metrics to Prometheus with multiple replicas ?
Thanks,