I have this Prometheus alert expression which tries to capture if/when we exceed the monthly quota of a service by using the increase
function on a counter metric over a 30day period.
sum(increase(external_requests_total{cacheHit="false", environment="prod", partner="partner_name"}[30d])) > 10000
I believe we should use a recording rule to somehow have a pre-calculated value to avoid crunching a month’s worth of time-series data on each rules evaluation, but I also can’t help but feel using a prometheus alert is not the right way to monitor this metric.
I’m open for suggestion on improving the rule or even a better alternative for this this kind of monitoring.