I have a Cron job running every 24h. If it fails, it retry every 4h for 5 times.
When the job succeed it emits 1 metric unit, let’s say custom.audience.success
I need to create a Wavefront alert that fires if the job didn’t succeed in the past 24h.
I tried with the SUM function and the MSUM function, but the alert is not firing when it should.
default(1, msum(24h, ts(custom.audience.success.count))) = 0
Any idea how to achieve this, please?
I think the problem is that Wavefron assumes there is no data and goes to the default function, but it doesn’t work either without the default as it just shows no data all the time.
Other parameters I have are:
Trigger Window: Alert fires if the condition has been true for last 10 minute(s).
Resolve Window Alert resolves automatically if the condition is false for 15 minute(s).
Checking Frequency Default time window between alert checks is 5 minute(s).
Thanks