I have 3 node in k8s cluster, where all are masters, i.e. I have removed the node-role.kubernetes.io/master
taint.
I physically removed the network cable on foo2
, so I have
kubectl get nodes
NAME STATUS ROLES AGE VERSION
foo1 Ready master 3d22h v1.13.5
foo2 NotReady master 3d22h v1.13.5
foo3 Ready master 3d22h v1.13.5
After several hours some of the pods are still in STATUS = Terminating
though I think they should be in Terminated
?
I read at https://www.bluematador.com/docs/troubleshooting/kubernetes-pod
In rare cases, it is possible for a pod to get stuck in the terminating state. This is detected by finding any pods where every container has been terminated, but the pod is still running. Usually, this is caused when a node in the cluster gets taken out of service abruptly, and the cluster scheduler and controller-manager do not clean up all of the pods on that node.
Solving this issue is as simple as manually deleting the pod using kubectl delete pod .
The pod describe says if unreachable for 5 minutes will be tolerated ...
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
etcd-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
I have tried kubectl delete pod etcd-lns4g5xkcw
which just hung, though forcing it does work as per this answer ...
kubectl delete pod etcd-lns4g5xkcw --force=true --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "etcd-lns4g5xkcw" force deleted
(1) Why is this happening ? Shouln't it change to terminated?
(2) Where even is STATUS = Terminating
coming from? At https://v1-13.docs.kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ I See only Waiting/Running/Terminated as the options ?
Pods volume and network cleanup can consume more time while in termination
status. Proper way to do it is to drain node in order to get pods terminated successfully in grace period. Because you plugged out the network cable the node has changed its status to not ready
with pods already running on it. Due to this pod could not terminate.
You may find this information from k8s documentation about terminating
status useful:
Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. The Pods running on an unreachable Node enter the ‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node:
There are 3 suggested ways to remove it from apiserver:
The Node object is deleted (either by you, or by the Node Controller). The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver. Force deletion of the Pod by the user.
Here you can find more information about background deletion from k8s offical documentation