Prometheus cannot scrape kubernetes metrics

7/16/2021

I have setup a kubernetes cluster using kubeadm. I then deployed prometheus on it using the community helm charts. I notice that prometheus cannot scrape metrics from the scheduler, etcd or the controller manager.

For example I see errors like this:

Get "https://192.168.3.83:10259/metrics": dial tcp 192.168.3.83:10259: connect: connection refused

The reason I get these errors is because there is in fact nothing listening on https://192.168.3.83:10259/metrics. This because kube-scheduler has --bind-address set to 127.0.0.1

One way I can fix this is by manually editing the manifest files in /etc/kubernetes/manifests, changin --bind-address to 0.0.0.0

When I do this prometheus is able to scrape those metrics.

However, is this the correct solution? I assume that those manifest files are actually managed by kubernetes itself, and that I should probably not directly edit those, and do something else. But what?

edit: I have since noticed that changes I make to the manifest files do indeed get overwritten when doing an upgrade. And now I have again lost the etcd and other metrics.

I must be missing something obvious here.

I though that maybe changing the "clusterconfiguration" configmap would do the trick. But if you can do this (and how you should do this) is not documented anywhere.

I have an out of the box kubernetes, and out of the box prometheus and it does not collect metrics. I cannot be the only one running in to this issue. Is there really no solution?

-- Krist van Besien
kubernetes
prometheus

1 Answer

3/25/2022

Exposing kube-scheduler, etcd or the kube-controller manager (and persisting the changes)

You can expose the metrics on 0.0.0.0 just as you have done by editing the configmap and then pulling those changes to each control plane node. These changes will then be persisted accross upgrades. For etcd this can also be done in another way which might preferrable (see further down)

First step: edit the configmap with the below command:

kubectl edit -n kube-system cm/kubeadm-config

Add/change the relevant bind addresses as described here, but for example for etcd like outline below:

kind: ClusterConfiguration
etcd:
  local:
    extraArgs:
      listen-metrics-urls: http://0.0.0.0:2381

Second step: NOTE: Please read here to understand the upgrade command before applying it to any cluster you care about since it might also update cluster component versions unless you just did an upgrade :)

For the changes to be reflected you thus need to run kubeadm upgrade node on each controlplane node (one at the time please..) This will bring down the affected pods (those to which you have made changes) and start a new instance with the metrics exposed. You can verify before & after with for example: netstat -tulpn | grep etcd

For etcd the default port in Prometheus is 2379 so it also need to be adjusted to 2381 as below in your prometheus value file:

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

Source to the above solution here

Accessing existing etcd metrics without exposing it further

For ETCD metrics there is a second, perhaps preferred way of accessing the metrics using the already exposed https metric endpoint on port 2379 (which requires authentication). You can verify this with Curl:

curl https://<your IP>:2379/metrics -k --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key

For this to work we need to supply Prometheus with the correct certificates as a secret in kubernetes. Steps described here and outlined below:

Create a secret in the namespace where Prometheus is deployed.

kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key

add the following to your prometheus helm value file

prometheus:
  prometheusSpec:
    secrets: ['etcd-client-cert']

kubeEtcd:
  serviceMonitor:
   scheme: https
   insecureSkipVerify: false
   serverName: localhost
   caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
   certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
   keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key

Prometheus should now be able to access the https endpoint with the certificates that we mounted in the secret. I would say this is the preferred way for etcd since we don't expose the open http endpoint further.

-- apisen
Source: StackOverflow