Alertmanager notifications are delayed. Alertmanager re-sends notifications after (group_interval + repeat_interval) time. We are expecting the notifications to be re-sent after repeat-interval time for an alert which is still active.
We have following configuration:
group_wait: 10s
group_interval: 2h
repeat_interval: 2h
When the alert rule condition is met, Prometheus waits for 30 minutes before firing an alert. After 30 minutes, alert gets fired and alertmanager sends initial notification after 10 seconds of group_wait time. All of this works fine till this step but then next notifications for the same alert get re-sent after every 4 hours. We are expecting to be sent after every 2 hours as per configuration of repeat_interval. There is no change in alert state and there is no new alert in the same group.
If we configure both, group_interval and repeat_interval with value 1h, initial notification is sent in 10 seconds but next notifications for the same alert are re-sent after every 2 hours instead of 1 hour.
After trying below configuration, next notifications are resent after 1 hour 30 minutes instead of 1 hour.
group_interval: 30m
repeat_interval: 60m or 1h
So it looks like alertmanager considers the duration of (group_interval + repeat_interval) to re-send the next notifications instead of repeat_interval. Wondering why this could be happening, I will appreciate if someone can explain the reason and possibly provide a solution. Thanks in advance.
Aruna Thuse is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.