promethues operator alertmanager-main-0 pending and display

11/5/2019

What happened? kubernetes version: 1.12 promethus operator: release-0.1 I follow the README:

$ kubectl create -f manifests/

# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done

$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).

and then I use the command and then is showed like:

[root@VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                   2/2     Running   0          66s
alertmanager-main-1                   1/2     Running   0          47s
grafana-54f84fdf45-kt2j9              1/1     Running   0          72s
kube-state-metrics-65b8dbf498-h7d8g   4/4     Running   0          57s
node-exporter-7mpjw                   2/2     Running   0          72s
node-exporter-crfgv                   2/2     Running   0          72s
node-exporter-l7s9g                   2/2     Running   0          72s
node-exporter-lqpns                   2/2     Running   0          72s
prometheus-adapter-5b6f856dbc-ndfwl   1/1     Running   0          72s
prometheus-k8s-0                      3/3     Running   1          59s
prometheus-k8s-1                      3/3     Running   1          59s
prometheus-operator-5c64c8969-lqvkb   1/1     Running   0          72s
[root@VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                   0/2     Pending   0          0s
grafana-54f84fdf45-kt2j9              1/1     Running   0          75s
kube-state-metrics-65b8dbf498-h7d8g   4/4     Running   0          60s
node-exporter-7mpjw                   2/2     Running   0          75s
node-exporter-crfgv                   2/2     Running   0          75s
node-exporter-l7s9g                   2/2     Running   0          75s
node-exporter-lqpns                   2/2     Running   0          75s
prometheus-adapter-5b6f856dbc-ndfwl   1/1     Running   0          75s
prometheus-k8s-0                      3/3     Running   1          62s
prometheus-k8s-1                      3/3     Running   1          62s
prometheus-operator-5c64c8969-lqvkb   1/1     Running   0          75s

I don't know why the pod altertmanager-main-0 pending and disaply then restart. And I see the event, it is showed as:

72s         Warning   FailedCreate             StatefulSet   create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s         Warning   FailedCreate             StatefulSet   create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s         Warning^Z   FailedCreate             StatefulSet
[10]+  Stopped                 kubectl get events -n monitoring
-- edselwang
kubernetes
prometheus
prometheus-operator

1 Answer

11/5/2019

Most likely the alertmanager does not get enough time to start correctly.

Have a look at this answer : https://github.com/coreos/prometheus-operator/issues/965#issuecomment-460223268

You can set the paused field to true, and then modify the StatefulSet to try if extending the liveness/readiness solves your issue.

-- Nicolas F
Source: StackOverflow