I have installed prometheus-operator via helm and now want to set custom alert rule, email notifications are set up, currently i'm getting every notification, i want to "silence it" so i can get emails for custom alerts.
alertmanager.yaml:
global:
resolve_timeout: 5m
route:
receiver: 'email-alert'
group_by: ['job']
routes:
- receiver: 'email-alert'
match:
alertname: etcdInsufficientMembers
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: email-alert
email_configs:
- to: receiver@example.com
from: sender@example.com
# Your smtp server address
smarthost: smtp.office365.com:587
auth_username: sender@example.com
auth_identity: sender@example.com
auth_password: pass
Above file is applied sucessfully,
i added following lines at the end of above file, as referenced here:
# Example group with one alert
groups:
- name: example-alert
rules:
# Alert about restarts
- alert: RestartAlerts
expr: count(kube_pod_container_status_restarts_total) > 0
for: 1s
annotations:
summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}
And then in pod logs i'm getting this:
="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 28: field groups not found in type config.plain"
Solved, first, need to list all available rules:
kubectl -n monitoring get prometheusrules
NAME AGE
prometheus-prometheus-oper-alertmanager.rules 29h
prometheus-prometheus-oper-etcd 29h
prometheus-prometheus-oper-general.rules 29h
prometheus-prometheus-oper-k8s.rules 29h
prometheus-prometheus-oper-kube-apiserver-error 29h
prometheus-prometheus-oper-kube-apiserver.rules 29h
prometheus-prometheus-oper-kube-prometheus-node-recording.rules 29h
prometheus-prometheus-oper-kube-scheduler.rules 29h
prometheus-prometheus-oper-kubernetes-absent 29h
prometheus-prometheus-oper-kubernetes-apps 29h
prometheus-prometheus-oper-kubernetes-resources 29h
prometheus-prometheus-oper-kubernetes-storage 29h
prometheus-prometheus-oper-kubernetes-system 29h
prometheus-prometheus-oper-kubernetes-system-apiserver 29h
prometheus-prometheus-oper-kubernetes-system-controller-manager 29h
prometheus-prometheus-oper-kubernetes-system-kubelet 29h
prometheus-prometheus-oper-kubernetes-system-scheduler 29h
prometheus-prometheus-oper-node-exporter 29h
prometheus-prometheus-oper-node-exporter.rules 29h
prometheus-prometheus-oper-node-network 29h
prometheus-prometheus-oper-node-time 29h
prometheus-prometheus-oper-node.rules 29h
prometheus-prometheus-oper-prometheus 29h
prometheus-prometheus-oper-prometheus-operator 29h
Then choose one to edit, or delete all except default one: prometheus-prometheus-oper-general.rules
i choose to edit node-exporter rule
kubectl edit prometheusrule prometheus-prometheus-oper-node-exporter -n monitoring
Added these lines at the end of file
- alert: RestartAlerts
annotations:
description: '{{ $labels.container }} restarted (current value: {{ $value}}s)
times in pod {{ $labels.namespace }}/{{ $labels.pod }}'
summary: More than 5 restarts in pod {{ $labels.container }}
expr: kube_pod_container_status_restarts_total{container="coredns"} > 5
for: 1min
labels:
severity: warning
And soon after, i received email for this alert.