CrashLoopBackOff in Prometheus's AlertManager

11/29/2018

I am trying to setup AlertManager for my Kubernetes cluster. I have followed this document (https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md) -> Everything Ok.

For setting AlertManager, I am studying this document (https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md)

I am getting the CrashLoopBackOff for alertmanager-example-0. Please check the log attached:

1st image : $ kubectl logs -f prometheus-operator-88fcf6d95-zctgw -n monitoring

2nd image : $ kubectl describe pod alertmanager-example-0

enter image description here enter image description here

Can anyone point out what am I doing wrong? Thanks in advance.

-- JibinNajeeb
docker
kubernetes
prometheus
prometheus-alertmanager
prometheus-operator

1 Answer

11/29/2018

Sounds like you have an issue where RBAC and the Service Account (system:serviceaccount:monitoring:prometheus-operator) used by your Alert manager pods doesn't have enough permissions to talk to the kube-apiserver.

In your the case of the Prometheus Operator has a ClusterRoleBinding prometheus-operator that looks like this:

$ kubectl get clusterrolebinding prometheus-operator -o=yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: prometheus-operator
  name: prometheus-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: monitoring

More importantly, the ClusterRole should look something like this:

$ kubectl get clusterrole prometheus-operator -o=yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: prometheus-operator
  name: prometheus-operator
rules:
- apiGroups:
  - extensions
  resources:
  - thirdpartyresources
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanager
  - alertmanagers
  - prometheus
  - prometheuses
  - service-monitor
  - servicemonitors
  - prometheusrules
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - delete
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  verbs:
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - list
  - watch
-- Rico
Source: StackOverflow