I am trying to set an alert on Grafana. My data source is Google Managed Prometheus. The query that I am using to set an alert on is:
max by(cluster_name, namespace_name, pod_name, container_name) (max_over_time(kubernetes_io:container_memory_limit_utilization{memory_type="non-evictable"}[1h]))
I think that it is important to mention that we are having a quite big number of Kubernetes clusters, pods and containers. When I am trying to run this query I am getting an error:
Failed to evaluate queries and expressions: failed to execute conditions: failed to execute query A: client_error: client error: 429
Considering that 429 error message I suppose that I am hitting some rate limit – either set by Grafana itself, either from Google Cloud Monitoring, either from Google Managed Prometheus (maybe someone can help me a bit with the direction of what I need to adjust). On the other hand, by checking the console I can see that the error thrown is a 400 – failed to load resource.
The Grafana version that we use is Grafana 9.0.5
I expect the query to work. I mean it is quite heavy, but not huge. Or I do not know how to define what a heavy query is, or how to evaluate that and what the limits imposed by Grafana are.