error "No nodes are available that match all of the predicates: MatchNodeSelector (7), PodToleratesNodeTaints (1)" for kube-state-metrics

1/10/2020

I am getting an error "No nodes are available that match all of the predicates: MatchNodeSelector (7), PodToleratesNodeTaints (1)" for kube-state-metrics. Please guide me how to troubleshoot this issue

admin@ip-172-20-58-79:~/kubernetes-prometheus$ kubectl describe po -n kube-system kube-state-metrics-747bcc4d7d-kfn7t

Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  3s (x20 over 4m)  default-scheduler  No nodes are available that match all of the predicates: MatchNodeSelector (7), PodToleratesNodeTaints (1).

is this issue related to memory on a node? If yes how do I confirm it? I checked all nodes only one node seems to be above 80%, remaining are between 45% to 70% memory usage

Node with 44% memory usage: Node with 44% memoery usage:

Total cluster memory usage:
Total cluster memory usage:

following screenshot shows kube-state-metrics (0/1 up) :

enter image description here

Furthermore, Prometheus showing kubernetes-pods (0/0 up) is it due to kube-state-metrics not working or any other reason? and kubernetes-apiservers (0/1 up) seen in the above screenshot why is not up? How to troubleshoot it?

enter image description here

admin@ip-172-20-58-79:~/kubernetes-prometheus$ sudo tail -f /var/log/kube-apiserver.log | grep error

I0110 10:15:37.153827       7 logs.go:41] http: TLS handshake error from 172.20.44.75:60828: remote error: tls: bad certificate
I0110 10:15:42.153543       7 logs.go:41] http: TLS handshake error from 172.20.44.75:60854: remote error: tls: bad certificate
I0110 10:15:47.153699       7 logs.go:41] http: TLS handshake error from 172.20.44.75:60898: remote error: tls: bad certificate
I0110 10:15:52.153788       7 logs.go:41] http: TLS handshake error from 172.20.44.75:60936: remote error: tls: bad certificate
I0110 10:15:57.154014       7 logs.go:41] http: TLS handshake error from 172.20.44.75:60992: remote error: tls: bad certificate
E0110 10:15:58.929167       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58104: write: connection reset by peer
E0110 10:15:58.931574       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58098: write: connection reset by peer
E0110 10:15:58.933864       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58088: write: connection reset by peer
E0110 10:16:00.842018       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58064: write: connection reset by peer
E0110 10:16:00.844301       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58058: write: connection reset by peer
E0110 10:18:17.275590       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.44.75:37402: write: connection reset by peer
E0110 10:18:17.275705       7 runtime.go:66] Observed a panic: &errors.errorString{s:"kill connection/stream"} (kill connection/stream)
E0110 10:18:17.276401       7 runtime.go:66] Observed a panic: &errors.errorString{s:"kill connection/stream"} (kill connection/stream)
E0110 10:18:17.277808       7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.44.75:37392: write: connection reset by peer

Update after MaggieO's reply:

admin@ip-172-20-58-79:~/kubernetes-prometheus/kube-state-metrics-configs$ cat   deployment.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v1.8.0
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: v1.8.0
    spec:
      containers:
      - image: quay.io/coreos/kube-state-metrics:v1.8.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics

Furthermore, I want to add this command to above deployment.yaml but getting indentation error. show please help me where should I add it exactly.

command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

Update 2: @MaggieO even after adding commands/args it is showing same error and pod is in pending state :

Update deployment.yaml :

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"v1.8.0"},"name":"kube-state-metrics","namespace":"kube-system"},"spec":{"replicas":1,"selector":{"matchLabels":{"app.kubernetes.io/name":"kube-state-metrics"}},"template":{"metadata":{"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"v1.8.0"}},"spec":{"containers":[{"args":["--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"],"image":"quay.io/coreos/kube-state-metrics:v1.8.0","imagePullPolicy":"Always","livenessProbe":{"httpGet":{"path":"/healthz","port":8080},"initialDelaySeconds":5,"timeoutSeconds":5},"name":"kube-state-metrics","ports":[{"containerPort":8080,"name":"http-metrics"},{"containerPort":8081,"name":"telemetry"}],"readinessProbe":{"httpGet":{"path":"/","port":8081},"initialDelaySeconds":5,"timeoutSeconds":5}}],"nodeSelector":{"kubernetes.io/os":"linux"},"serviceAccountName":"kube-state-metrics"}}}}
  creationTimestamp: 2020-01-10T05:33:13Z
  generation: 4
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v1.8.0
  name: kube-state-metrics
  namespace: kube-system
  resourceVersion: "178851301"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-state-metrics
  uid: b20aa645-336a-11ea-9618-0607d7cb72ed
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: v1.8.0
    spec:
      containers:
      - args:
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        image: quay.io/coreos/kube-state-metrics:v1.8.0
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
          protocol: TCP
        - containerPort: 8081
          name: telemetry
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-state-metrics
      serviceAccountName: kube-state-metrics
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: 2020-01-10T05:33:13Z
    lastUpdateTime: 2020-01-10T05:33:13Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2020-01-15T07:24:27Z
    lastUpdateTime: 2020-01-15T07:29:12Z
    message: ReplicaSet "kube-state-metrics-7f8c9c6c8d" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 4
  replicas: 2
  unavailableReplicas: 2
  updatedReplicas: 1

Update 3: It is not able to get a node as shown in the following screenshot, let me know how to troubleshoot this issue

enter image description here

-- Ashish Karpe
kube-apiserver
kube-state-metrics
kubernetes
kubernetes-pod
prometheus

1 Answer

1/10/2020

Error on kubernetes-apiservers Get https:// ...: x509: certificate is valid for 100.64.0.1, 127.0.0.1, not 172.20.58.79 means that controlplane nodes are targeted randomly, and the apiEndpoint only changes when the node is deleted from the cluster, it is not immediately noticeable as it requires changes with nodes in the cluster.

Workaround fix: manually synchronize kube-apiserver.pem between master nodes and restart kube-apiserver container.

You can also remove the apiserver. and apiserver-kubelet-client. and recreate them with commands:

$ kubeadm init phase certs apiserver --config=/etc/kubernetes/kubeadm-config.yaml
$ kubeadm init phase certs apiserver-kubelet-client --config=/etc/kubernetes/kubeadm-config.yaml
$ systemctl stop kubelet
delete the docker container with kubelet
$ systemctl restart kubelet

Similar problems: x509 certificate, kubelet-x509.

Then solve problem with metrics server.

Change the metrics-server-deployment.yaml file, and set the following args:

command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node).

More information you can find here: metrics-server-issue.

-- MaggieO
Source: StackOverflow