Disclaimer: First time I use Prometheus.
I am trying to send a Slack notification every time a Job ends successfully.
To achieve this, I installed kube-state-metrics, Prometheus and AlertManager.
Then I created the following rule:
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
labels:
severity: information
And added the AlertManager receiver text (template) :
{{ define "custom_slack_message" }}
{{ range .Alerts }}
{{ .Annotations.description }}
{{ end }}
{{ end }}
My current result: Everytime a new job completes successfully, I receive a Slack notification with the list of all Job that completed successfully.
I don't mind receiving the whole list at first but after that I would like to receive notifications that contain only the newly completed job(s) in the specified group interval.
Is it possible?
I ended up using kube_job_status_completion_time and time() to dismiss past events (avoid refiring event upon repeat time).
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
time() - kube_job_status_completion_time < 60 and kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
labels:
severity: information
Just add extra rule which will just display last completed job(s):
line: for: <10m>
- which will list just lastly completed job(s) in 10 minutes:
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
for: 10m
labels:
severity: information