I have a set of applications which are being monitored by Prometheus. Below are sample time series of the metrics:
fastapi_responses_total{app_name="orion-gen-proj_managed-cus-test-1_checkout-api", job="app-b", method="GET", path="/metrics", status_code="200"}
fastapi_responses_total{app_name="orion-gen-proj_managed-cus-test-1_registration-api", job="app-a", method="GET", path="/metrics", status_code="200"
These metrics are captured from two different applications. The application names are represented by the substrings checkout-api and registration-api . The applications run on an organisation entity and this entity is represented by the substring managed-cus-test-1. The name of the organisation that an application belongs to always starts with the string managed- but can it have any wildcard value after the string “managed-” e.g managed-cus-test-1 , managed-cus-test-2, managed-cus-test-3
To calculate the availability SLO for these applications I have prepared the following set of recording rules:
groups:
- name: registration-availability
rules:
- record: slo:sli_error:ratio_rate5m
expr: |-
(avg_over_time( ( (
(sum(rate(
fastapi_responses_total{
app_name=~".*registration-api.*",
status_code!~"5.."}[5m]
))
)
/ on(app_name) group_right()
(sum(rate(
fastapi_responses_total{
app_name=~".*registration-api.*"}[5m])
))
) OR on() vector(0))[5m:60s])
)
labels:
slo_id: registration-availability
slo_service: registration
slo: availability
- name: checkout-availability
rules:
- record: slo:sli_error:ratio_rate5m
expr: |-
(avg_over_time( ( (
(sum(rate(
fastapi_responses_total{
app_name=~".*checkout-api.*",
status_code!~"5.."}[5m]
))
)
/ on(app_name) group_right()
(sum(rate(
fastapi_responses_total{
app_name=~".*checkout-api.*"}[5m])
))
) OR on() vector(0))[5m:60s])
)
labels:
slo_id: checkout-availability
slo_service: checkout
slo: availability
The recording rules are evaluating correctly and they return two different SLO values, one for each of the applications. I have a requirement to calculate the overall SLO of these two applications. This overall SLO should be based on the organisation to which an application belongs.
For example, because the applications checkout-api and registration-api belong to the same organisation, the SLO calculation should return one consolidated value.
What I want is a label_replace that adds a new label “org” and then does grouping by “org”.
The label_replace should add the new label and preserve the existing filter based on app_name, not replace it.