We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes. We are running one statefulset pod with replication 3 among other pods in this cluster. Initially the statefulset pods are distributed to 3 nodes. However the pod-2 on node-2 got evicted due to the disk pressure on node-2. However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure. As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure. Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed. The reason is, at the same time we do observe, node-0 without any disk issue. So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.
Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1. However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted. Is this any issue with the kube-scheduler.
Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.
However, I think you can approach this problem from few different angles.
Look into configuration of your scheduler:
./kube-scheduler --write-config-to kube-config.yaml
and check it needs any adjustments. You can find info about additional options for kube-scheduler here:
You can also configure aditional scheduler(s) depending on your needs. Tutorial for that can be found here
Check the logs:
kubeclt logs
: kube-scheduler events logsjournalctl -u kubelet
: kubelet logs/var/log/kube-scheduler.log
(on the master)Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.
Bear in mind that:
Please check out my suggestions and let me know if they helped.