Prometheus Operator + New Kubernetes Minikube = DeadMansSwitch + KubeControllerManagerDown + KubeSchedulerDown + TargetDown

10/31/2018

If I start up a fresh clean empty minikube and helm install the latest stable/prometheus-operator with strictly default settings I see four active Prometheus alarms.

In this super simplified scenario where I have a clean fresh minikube that is running absolutely nothing other than Prometheus, there should be no problems and no alarms. Are these alarms bogus or broken? Is something wrong with my setup or should I submit a bug report and disable these alarms for the time being?

Here are my basic setup steps:

minikube delete
# Any lower memory/cpu settings will experience problems
minikube start --memory 10240 --cpus 4 --kubernetes-version v1.12.2
eval $(minikube docker-env)
helm init
helm repo update
# wait a minute for Helm Tiller to start up.
helm install --name my-prom stable/prometheus-operator

Wait several minutes for everything to start up, then run port forwarding on Prometheus server and on Grafana:

kubectl port-forward service/my-prom-prometheus-operato-prometheus 9090:9090
kubectl port-forward service/my-prom-grafana 8080:80

Then go to http://localhost:9090/alerts and see:

DeadMansSwitch (1 active)
KubeControllerManagerDown (1 active)
KubeSchedulerDown (1 active)
TargetDown (1 active)

Are these bogus? Is something genuinely wrong? Should I disable these?

Two of these alarms are missing metrics:

  • KubeControllerManagerDown: absent(up{job="kube-controller-manager"} == 1)
  • KubeSchedulerDown: absent(up{job="kube-scheduler"} == 1)

In http://localhost:9090/config, I don't see either job configured but I do see very closely related a jobs with job_name values of default/my-prom-prometheus-operato-kube-controller-manager/0 and default/my-prom-prometheus-operato-kube-scheduler/0. This suggests that job_name values are supposed to match and there is a bug where they do not match. I also don't see any collected metrics for either job. Are slashes allowed in job names?

The other two alarms:

  • DeadMansSwitch: The alarm expression is vector(1). I have no idea what this is.
  • TargetDown: This alarm is being triggered over up{job="kubelet"} which has two metric values, one up with a value of 1.0 and one down with a value of 0.0. The up value is for endpoint="http-metrics" and the down valie is for endpoint="cadvisor". Is that latter endpoint supposed to be up? Why wouldn't it be?

I go to http://localhost:9090/graph and run sum(up) by (job) I see 1.0 values for all of:

{job="node-exporter"}
{job="my-prom-prometheus-operato-prometheus"}
{job="my-prom-prometheus-operato-operator"}
{job="my-prom-prometheus-operato-alertmanager"}
{job="kubelet"}
{job="kube-state-metrics"}
{job="apiserver"}

fyi, kubectl version shows:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:16Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
-- clay
kubernetes
minikube
prometheus
prometheus-operator

2 Answers

3/28/2019

The Watchdog alert (formerly named as DeadManSwitch) is:

An alert meant to ensure that the entire alerting pipeline is functional. This alert is always firing, therefore it should always be firing in Alertmanager and always fire against a receiver.

In Minikube, the kube-controller-manager and kube-scheduler listen by default on 127.0.0.1, so Prometheus cannot scrape metrics from them. You need to start Minikube with these components listening on all interfaces:

minikube start --kubernetes-version v1.12.2 \
--bootstrapper=kubeadm \
--extra-config=scheduler.address=0.0.0.0 \
--extra-config=controller-manager.address=0.0.0.0

Another cause of TargetDown is that the default service selectors created by Prometheus Operator helm chart don’t match the labels used by Minikube components. You need to match them by setting the kubeControllerManager.selector and kubeScheduler.selector helm parameters.

Take a look at this article: Trying Prometheus Operator with Helm + Minikube. It addresses all these problems, how to solve them and much more.

-- Eduardo Baitello
Source: StackOverflow

11/1/2018

DeadManSwitchAlarm is vector(1) which is an alarm which always triggers, it is generally used to test that your alertmanager is working or not.

You are possibly hitting this issue,

https://github.com/coreos/prometheus-operator/issues/1001

Hope this helps.

-- Prafull Ladha
Source: StackOverflow