Installed Prometheus with:
helm install --name promeks --set server.persistentVolume.storageClass=gp2 stable/prometheus
Only saw 7 node-exporter pods created but there are 22 nodes.
$ kubectl get ds promeks-prometheus-node-exporter
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
promeks-prometheus-node-exporter 22 7 7 7 7 <none> 11d
$ kubectl describe ds promeks-prometheus-node-exporter
$ kubectl describe ds promeks-prometheus-node-exporter
Name: promeks-prometheus-node-exporter
Selector: app=prometheus,component=node-exporter,release=promeks
Node-Selector: <none>
Labels: app=prometheus
chart=prometheus-7.0.2
component=node-exporter
heritage=Tiller
release=promeks
Annotations: <none>
Desired Number of Nodes Scheduled: 22
Current Number of Nodes Scheduled: 20
Number of Nodes Scheduled with Up-to-date Pods: 20
Number of Nodes Scheduled with Available Pods: 20
Number of Nodes Misscheduled: 0
Pods Status: 20 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=prometheus
component=node-exporter
release=promeks
Service Account: promeks-prometheus-node-exporter
Containers:
prometheus-node-exporter:
Image: prom/node-exporter:v0.16.0
Port: 9100/TCP
Host Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
Environment: <none>
Mounts:
/host/proc from proc (ro)
/host/sys from sys (ro)
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
Events: <none>
In which Prometheus pod will I find logs or events where it is complaining that 15 pods can't be scheduled?
I was able to recreate your issue, however not sure if root cause was the same.
1) You can get all events from whole cluster
kubeclt get events
In your case when have 22 nodes it would be better if you use grep
kubectl get events | grep Warning
or
kubectl get events | grep daemonset-controller
2) SSH to node withoud pod. Use command
docker ps -a
Locate CONTAINER ID from entry where NAMES include node name.
docker inspect <ContainerID>
You will get a lot informations about container which may help you define why it is failing.
In my case I had issue with PersistentVolumeClaim (did not have gp2 storage class) and insufficient CPU resources.
Storage class can be obtain by
kubectl get storageclass