Restore a node after being purged due to resources pressure

2/19/2020

I have a k8s cluster setup using kubespray.

Last week one of my k8s nodes have very low storage, so all the pods has been evicted, include some important pods like calico-node, kube-proxy (I thought that these pods are critical and never been evicted no matter what)

After that all the calico-node pods become not ready, when I check the log, it is said that: Warning: Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.xxx, where 192.168.0.xxx is the IP of above problematic node.

My question is how can I restore that node? is it safe to just run the kubespray's cluster.yml again?

My k8s version is v1.13.3

Thanks.

-- Hiep Ho
kubernetes
kubespray

1 Answer

2/20/2020

When node has a disk pressure its status changes to NotReady and a taint is added to the node: Taints: node.kubernetes.io/disk-pressure:NoSchedule.

All pods running on this node are getting evicted, except api-server, kube-controller and kube-scheduler- eviction manager will save those pods from getting evicted with error message: cannot evict a critical static pod [...]

Once the node is freed from disk pressure it will change its status to Ready and previously added taint will be removed. You can check it by running kubectl describe node <node_name>. In the conditions field you should see that DiskPressure has changed status to False which means that node has enough space available. Similar information can be also found in Events field.

  Normal   NodeReady                1s                     kubelet, node1     Node node1 status is now: NodeReady
  Normal   NodeHasNoDiskPressure    1s (x2 over 1s)        kubelet, node1     Node node1 status is now: NodeHasNoDiskPressure

After confirming that the node is ready with sufficient disk space you can restart kubelet and run kubespray's cluster.yml- the pods will be redeployed on the node. You just have to make sure that node is ready to handle deployments.

-- KFC_
Source: StackOverflow