Kubernetes pod stuck in state=Terminating after node goes to status = NotReady?

10/22/2019

I have 3 node in k8s cluster, where all are masters, i.e. I have removed the node-role.kubernetes.io/master taint.

I physically removed the network cable on foo2, so I have

kubectl get nodes
NAME   STATUS     ROLES    AGE     VERSION
foo1   Ready      master   3d22h   v1.13.5
foo2   NotReady   master   3d22h   v1.13.5
foo3   Ready      master   3d22h   v1.13.5

After several hours some of the pods are still in STATUS = Terminating though I think they should be in Terminated ?

I read at https://www.bluematador.com/docs/troubleshooting/kubernetes-pod

In rare cases, it is possible for a pod to get stuck in the terminating state. This is detected by finding any pods where every container has been terminated, but the pod is still running. Usually, this is caused when a node in the cluster gets taken out of service abruptly, and the cluster scheduler and controller-manager do not clean up all of the pods on that node.

Solving this issue is as simple as manually deleting the pod using kubectl delete pod .

The pod describe says if unreachable for 5 minutes will be tolerated ...

Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etcd-data:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
    SizeLimit:   <unset>
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

I have tried kubectl delete pod etcd-lns4g5xkcw which just hung, though forcing it does work as per this answer ...

kubectl delete pod etcd-lns4g5xkcw  --force=true --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "etcd-lns4g5xkcw" force deleted

(1) Why is this happening ? Shouln't it change to terminated?

(2) Where even is STATUS = Terminating coming from? At https://v1-13.docs.kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ I See only Waiting/Running/Terminated as the options ?

-- k1eran
kubernetes

1 Answer

10/23/2019

Pods volume and network cleanup can consume more time while in termination status. Proper way to do it is to drain node in order to get pods terminated successfully in grace period. Because you plugged out the network cable the node has changed its status to not ready with pods already running on it. Due to this pod could not terminate.

You may find this information from k8s documentation about terminating status useful:

Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. The Pods running on an unreachable Node enter the ‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node:

There are 3 suggested ways to remove it from apiserver:

The Node object is deleted (either by you, or by the Node Controller). The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver. Force deletion of the Pod by the user.

Here you can find more information about background deletion from k8s offical documentation

-- acid_fuji
Source: StackOverflow