I'm running a Kubernetes cluster with one node manager and 4 nodes workers. when i start a pod , this one is correctly assigned to one worker and it start to run. When i shutdown the worker where the pod was assigned, the manager detect the node NotReady after 40 seconds, and after 2 second the pod become Terminating. i set this toleration for my pod:
spec:
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 2
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 2
so the behavior is what i expected. What i'm not expecting is that the pod remain in Terminating status till the worker come back to Ready. When the worker is up again the pod is deleted from my system. My expectation is once the tolerationSeconds expired the pod has to be scheduled on a different worker and run again. Below the cluster with the versions:
docker1 Ready <none> 21d v1.17.4 192.168.1.2 <none> CentOS Linux 7 (Core) 5.5.9-1.el7.elrepo.x86_64 docker://19.3.8
docker2 Ready <none> 21d v1.17.4 192.168.1.3 <none> CentOS Linux 7 (Core) 5.5.11-1.el7.elrepo.x86_64 docker://19.3.8
docker3 Ready <none> 21d v1.17.4 192.168.1.4 <none> CentOS Linux 7 (Core) 5.6.4-1.el7.elrepo.x86_64 docker://19.3.8
docker4 Ready <none> 19d v1.17.4 192.168.1.5 <none> CentOS Linux 7 (Core) 5.6.4-1.el7.elrepo.x86_64 docker://19.3.8
manager Ready master 22d v1.17.4 192.168.1.1 <none> CentOS Linux 7 (Core) 5.5.9-1.el7.elrepo.x86_64 docker://19.3.8
can anyone suggest me what i am missing or if this one is the correct behavior ?