Pods not moved on host failure

1/25/2020

i have setup myself a simple 1 master and 3 nodes setup running on Ubuntu based on the book "Kuberenetes Up & Running" in combination with the official documentation.

It basically works until i shutdown one of the worker nodes. After a few seconds the nodes-running-state switches to unknown. The pods keep report the state running even if the pods are located on the offline node.

Shouldn't k8s move these pods to a different healthy host? Am i missing something?

thanks in advice!

-- thepill
kubernetes

3 Answers

1/25/2020

I was able to work around this using this script to force drain any node that has gone into Not Ready status for greater than 5 mins (adjustable) then it will un cordon node the after it returns.

-- Devesh mehta
Source: StackOverflow

1/26/2020

With Kubernetes version 1.13 and higher, pod eviction on node failures/not-ready conditions is actually controlled by taints and tolerations. --pod-eviction-timeout parameter is not used anymore.

When a node goes down or is not ready, node-controller/kubelet will add the following taints to the node - node.kubernetes.io/unreachable and node.kubernetes.io/not-ready. All pods tolerate these taints for 300 seconds by default. You can control this toleration time cluster wide for all pods with flags to kube-api-server and also per pod using tolerations object in pod spec.

Cluster Wide configuration:

You can modify the toleration time cluster wide using --default-not-ready-toleration-seconds and --default-unreachable-toleration-seconds flags to kube-api-server.

From docs:

--default-not-ready-toleration-seconds int     Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int     Default: 300

Per pod configuration:

You can also modify the toleration time per pod using the following configuration.

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 120
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 120

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

-- Shashank V
Source: StackOverflow

1/25/2020

By default pods won't be moved for 5m minutes which is configurable via the following flag on the controller manager --pod-eviction-timeout duration.

After 5 min if it still not happening(stateful sets) you need to delete the node using kubectl delete node which would trigger a reschedule of the pods on the node.

From Kubernetes version 1.13 and higher, pod eviction on node failures/not-ready conditions is controlled by taints and tolerations. --pod-eviction-timeout parameter is ignored.

Cluster wide configuration can be configured via kubelet parameter.

--default-not-ready-toleration-seconds int     Default: 300Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a me toleration.

--default-unreachable-toleration-seconds int     Default: 300Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.

If you want to manage this attribute in POD level, you can add tolerations.

spec:
  tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30

Checkout this related issue

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

-- Arghya Sadhu
Source: StackOverflow