I have been trying to setup alert whenever a workload is running for too long in Google Monitoring but because there is a limitation of 24h (actually 23h30 minutes) that GCP can look back, this is not possible directly.
I have tried many queries in MQL but always blocked to look back in time .
Now I am exploring with PromQL, below is how far I went .
Checking whether an instance is present in an instance group:
avg(avg_over_time(compute_googleapis_com:instance_group_size{instance_group_name=~'.*pool-test.*'}[30d:1d] ))
Or also different approach , looking for how long containers s been running :
But this one does not let me use
[30d:1h]
avg(kubernetes_io:container_uptime{monitored_resource="k8s_container",cluster_name="test-gke",namespace_name="app",container_name='nginx-1'} /3600 )
by (pod_name)
Those are not perfect probably to measure uptime (how long a workload s been running) but that s all I could find . Not sure if there are better ways .
Now, I need to measure from 1st day of the month until now , and ideally get the data once a day.
I tried to play with time()
or timestamp()
but I dont understand the syntax and how to calculate: ‘ 1st day of the month – now ‘ .
If anyone knows how to do that in PromQL or advice , that’d be helpful .
Checking whether an instance is present in an instance group:
avg(avg_over_time(compute_googleapis_com:instance_group_size{instance_group_name=~’.pool-test.‘}[30d:1d] ))
Or also different approach , looking for how long containers s been running :
But this one does not let me use
[30d:1h]
avg(kubernetes_io:container_uptime{monitored_resource=”k8s_container”,cluster_name=”test-gke”,namespace_name=”app”,container_name=’nginx-1′} /3600 )
by (pod_name)