prometheus-adapter is not running properly

8/12/2021

After deploying Prometheus -operator according to the documentation, I find that kubectl top Nodes cannot run properly.

$ kubectl get apiService v1beta1.metrics.k8s.io 
v1beta1.metrics.k8s.io                  monitoring/prometheus-adapter   False (FailedDiscoveryCheck)   44m


$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ kubectl get  --raw "/apis/metrics.k8s.io/v1beta1"
Error from server (ServiceUnavailable): the server is currently unable to handle the request

prometheus-adapter.yaml

...
      - args:
        - --cert-dir=/var/run/serving-cert
        - --config=/etc/adapter/config.yaml
        - --logtostderr=true
        - --metrics-relist-interval=1m
        - --prometheus-url=http://prometheus-k8s.monitoring.svc.cluster.local:9090/prometheus
        - --secure-port=6443
...

When I was looking for a problem, I found a solution (#1060) by adding hostNetwork: true to the configuration file.

When I thought the solution was successful, I found that kubectl top nodes still does not work.

$ kubectl get apiService v1beta1.metrics.k8s.io
v1beta1.metrics.k8s.io   monitoring/prometheus-adapter   True        64m

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ kubectl get  --raw "/apis/metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"metrics.k8s.io/v1beta1","resources":[{"name":"nodes","singularName":"","namespaced":false,"kind":"NodeMetrics","verbs":["get","list"]},{"name":"pods","singularName":"","namespaced":true,"kind":"PodMetrics","verbs":["get","list"]}]}

View logs of Prometheus-adapter

E0812 10:03:02.469561       1 provider.go:265] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc.cluster.local:9090/prometheus/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D~%22node02.whisper-tech.net%7Cnode03.whisper-tech.net%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D~%22node02.whisper-tech.net%7Cnode03.whisper-tech.net%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1628762582.467": dial tcp: lookup prometheus-k8s.monitoring.svc.cluster.local on 100.100.2.136:53: no such host

The cause of the problem was that hostNetwork: true was added to the Prometheus-Adapter, which prevented pod from accessing Prometheus-K8s in the cluster through coreDNS.

One idea I've come up with is to have Kubernetes nodes access the inner part of the cluster through coreDNS

Is there a better way to solve the current problem? What should I do?

-- Xsky
kubectl
kubernetes
prometheus-operator

1 Answer

8/12/2021

Your Pods are running with hostNetwork, so you should explicitly set its DNS policy "ClusterFirstWithHostNet" as described in the Pod's DNS Policy documentation:

"ClusterFirstWithHostNet": For Pods running with hostNetwork, you should explicitly set its DNS policy "ClusterFirstWithHostNet".

I've created a simple example to illustrate how it works.


First, I created the app-1 Pod with hostNetwork: true:

$ cat app-1.yml
kind: Pod
apiVersion: v1
metadata:
  name: app-1
spec:
  hostNetwork: true
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"

$ kubectl apply -f app-1.yml
pod/app-1 created

We can test that the app-1 cannot resolve e.g. kubernetes.default.svc:

$ kubectl exec -it app-1 -- sh

/ # nslookup kubernetes.default.svc
Server:         169.254.169.254
Address:        169.254.169.254#53

** server can't find kubernetes.default.svc: NXDOMAIN

Let's add the dnsPolicy: ClusterFirstWithHostNet to the app-1 Pod and recreate it:

$ cat app-1.yml
kind: Pod
apiVersion: v1
metadata:
  name: app-1
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"

$ kubectl delete pod app-1 && kubectl apply -f app-1.yml
pod "app-1" deleted
pod/app-1 created

Finally, we can check if the app-1 Pod is able to resolve kubernetes.default.svc:

$ kubectl exec -it app-1 -- sh
/ # nslookup kubernetes.default.svc
Server:         10.8.0.10
Address:        10.8.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.8.0.1

As you can see in the example above, everything works as expected with the ClusterFirstWithHostNet dnsPolicy.

For more information, see the DNS for Services and Pods documentation.

-- matt_j
Source: StackOverflow