pod stuck in terminating and docker container can't exit

12/31/2019

Environment:

kubernetes=1.14.2
docker-ce=18.03.1~ce-0~ubuntu

Error describe:

pod rec-train-jzqzf-0 stuck in Terminating state.

kubelet log:

DEC 28 23:58:53 Node0296 kubelet[2283]: E1228 23:58:53.205139    2283 pod_workers.go:190] Error syncing pod 0058cf61-298a-11ea-901c-98039b61d091 ("rec-train-jzqzf-0_research(0058cf61-298a-11ea-901c-98039b61d091)"), skipping: rpc error: code = DeadlineExceeded desc = context deadline exceeded
DEC 28 23:59:02 Node0296 kubelet[2283]: I1228 23:59:02.205026    2283 kubelet.go:1823] skipping pod synchronization - PLEG is not healthy: pleg was last seen active 3m0.611016853s ago; threshold is 3m0s.

docker log:

DEC 28 23:53:09 Node0296 dockerd[3877]: time="2019-12-28T23:53:09.984864688+08:00" level=warning msg="Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap."
DEC 28 23:53:10 Node0296 dockerd[3877]: time="2019-12-28T23:53:10+08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/418eebcb5db650dbeced34df2469ccfff633d15660724e2da6156e77445f12da/shim.sock" debug=false module="containerd/tasks" pid=130177
DEC 28 23:54:30 Node0296 dockerd[3877]: time="2019-12-28T23:54:30.512317111+08:00" level=info msg="Container 418eebcb5db650dbeced34df2469ccfff633d15660724e2da6156e77445f12da failed to exit within 30 seconds of signal 15 - using the force"

Besides, I found "journalctl -u docker -f " stuck at DEC 29 13:46:02 as below.

DEC 29 13:46:02 Node0296 dockerd[3877]: time="2019-12-29T13:46:02+08:00" level=info msg="shim reaped" id=24f4968b5352df5eba4462382067ded6c6402878f957b83db2917824b560f1a1 module="containerd/tasks"
DEC 29 13:46:02 Node0296 dockerd[3877]: time="2019-12-29T13:46:02.256200069+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

As the docker log printed, it looks like the root cause is that container 418eebcb5db650db* failed to exit. Anyone could tell me why force exit not worked here? (I encountered this error several times, temporarily method is to reboot node, but it is painful.)

-- shliph
docker
kubernetes

2 Answers

12/31/2019

Run the below command

kubectl delete pod rec-train-jzqzf-0 --grace-period=0 --force
-- P Ekambaram
Source: StackOverflow

12/31/2019

Usually volume or network deletion can consume time during pod deletion.

You can use --now flag as below example, so that resources are signaled for immediate shutdown (same as using --grace-period=1)

kubectl delete pod <pod-name> --now

using "--force grace-period=0" may result in inconsistency or data loss during deletion of pods.

Refer "kubectl delete pod --help"

      --force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires
confirmation.

      --grace-period=-1: Period of time in seconds given to the resource to terminate gracefully. Ignored if negative.
Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).


      --now=false: If true, resources are signaled for immediate shutdown (same as --grace-period=1).
-- DT.
Source: StackOverflow