|
/etc/prometheus/rules/ansible_managed.rules > ansible managed alert rules
|
alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for
more than 5 minutes.'
summary: Instance {{ $labels.instance }} down
| Labels |
State |
Active Since |
Value |
|
alertname="InstanceDown"
instance="k001.kafka-100.insitechdev.ru:9308"
job="kafka"
severity="critical"
|
firing |
2025-11-17 12:28:29.318691705 +0000 UTC |
0 |
| Annotations |
- description
- k001.kafka-100.insitechdev.ru:9308 of job kafka has been down for more than 5 minutes.
- summary
- Instance k001.kafka-100.insitechdev.ru:9308 down
|
|
alert: Watchdog
expr: vector(1)
for: 10m
labels:
severity: warning
annotations:
description: |-
This is an alert meant to ensure that the entire alerting pipeline is functional.
This alert is always firing, therefore it should always be firing in Alertmanager
and always fire against a receiver. There are integrations with various notification
mechanisms that send a notification when this alert is not firing. For example the
"DeadMansSnitch" integration in PagerDuty.
summary: Ensure entire alerting pipeline is functional
| Labels |
State |
Active Since |
Value |
|
alertname="Watchdog"
severity="warning"
|
firing |
2024-12-14 15:44:14 +0000 UTC |
1 |
| Annotations |
- description
- This is an alert meant to ensure that the entire alerting pipeline is functional.
This alert is always firing, therefore it should always be firing in Alertmanager
and always fire against a receiver. There are integrations with various notification
mechanisms that send a notification when this alert is not firing. For example the
"DeadMansSnitch" integration in PagerDuty.
- summary
- Ensure entire alerting pipeline is functional
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
alert: RebootRequired
expr: node_reboot_required
> 0
labels:
severity: warning
annotations:
description: '{{ $labels.instance }} requires a reboot.'
summary: Instance {{ $labels.instance }} - reboot required
|