Where are Prometheus errors for unfulfilled node-exporter pods?

4/9/2019

Installed Prometheus with:

helm install --name promeks --set server.persistentVolume.storageClass=gp2 stable/prometheus

Only saw 7 node-exporter pods created but there are 22 nodes.

$ kubectl get ds promeks-prometheus-node-exporter

NAME                               DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
promeks-prometheus-node-exporter   22        7         7         7            7           <none>          11d

$ kubectl describe ds promeks-prometheus-node-exporter

$ kubectl describe ds promeks-prometheus-node-exporter
Name:           promeks-prometheus-node-exporter
Selector:       app=prometheus,component=node-exporter,release=promeks
Node-Selector:  <none>
Labels:         app=prometheus
                chart=prometheus-7.0.2
                component=node-exporter
                heritage=Tiller
                release=promeks
Annotations:    <none>
Desired Number of Nodes Scheduled: 22
Current Number of Nodes Scheduled: 20
Number of Nodes Scheduled with Up-to-date Pods: 20
Number of Nodes Scheduled with Available Pods: 20
Number of Nodes Misscheduled: 0
Pods Status:  20 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=prometheus
                    component=node-exporter
                    release=promeks
  Service Account:  promeks-prometheus-node-exporter
  Containers:
   prometheus-node-exporter:
    Image:      prom/node-exporter:v0.16.0
    Port:       9100/TCP
    Host Port:  9100/TCP
    Args:
      --path.procfs=/host/proc
      --path.sysfs=/host/sys
    Environment:  <none>
    Mounts:
      /host/proc from proc (ro)
      /host/sys from sys (ro)
  Volumes:
   proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
   sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
Events:            <none>

In which Prometheus pod will I find logs or events where it is complaining that 15 pods can't be scheduled?

-- ProGirlXOXO
amazon-eks
aws-eks
kubernetes
kubernetes-helm
prometheus

1 Answer

4/10/2019

I was able to recreate your issue, however not sure if root cause was the same.

1) You can get all events from whole cluster

kubeclt get events

In your case when have 22 nodes it would be better if you use grep

kubectl get events | grep Warning

or

kubectl get events | grep daemonset-controller

2) SSH to node withoud pod. Use command

docker ps -a

Locate CONTAINER ID from entry where NAMES include node name.

docker inspect <ContainerID>

You will get a lot informations about container which may help you define why it is failing.

In my case I had issue with PersistentVolumeClaim (did not have gp2 storage class) and insufficient CPU resources.

Storage class can be obtain by

kubectl get storageclass
-- PjoterS
Source: StackOverflow