Kubectl delete -f deployments/ --grace-period=0 --force does not work

9/19/2018

What happened:

Force terminate does not work:

[root@master0 manifests]# kubectl delete -f prometheus/deployment.yaml --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
deployment.extensions "prometheus-core" force deleted
^C <---- Manual Quit due to hanging. Waited over 5 minutes with no change.
[root@master0 manifests]# kubectl -n monitoring get pods
NAME                                  READY     STATUS        RESTARTS   AGE
alertmanager-668794449d-6dppl         0/1       Terminating   0          22h
grafana-core-576c68c58d-7nvbt         0/1       Terminating   0          22h
kube-state-metrics-69b9d65dd5-rl8td   0/1       Terminating   0          3h
node-directory-size-metrics-6hcfc     2/2       Running       0          3h
node-directory-size-metrics-w7zxh     2/2       Running       0          3h
node-directory-size-metrics-z2m5j     2/2       Running       0          3h
prometheus-core-59778c7987-vh89h      0/1       Terminating   0          3h
prometheus-node-exporter-27fjg        1/1       Running       0          3h
prometheus-node-exporter-2t5v6        1/1       Running       0          3h
prometheus-node-exporter-hhxmv        1/1       Running       0          3h

Then

What you expected to happen: Pod to be deleted

How to reproduce it (as minimally and precisely as possible): We feel that the there might have been an IO error with the storage on the pods. Kubernetes has its own dedicated direct storage. All hosted on AWS. Use of t3.xl

Anything else we need to know?: It seems to happen randomly but happens often enough as we have to reboot the entire cluster. Do stuck in termination can be ok to deal with, having no logs or no control to really force remove them and start again is frustrating.

Environment: - Kubernetes version (use kubectl version):

kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
AWS
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):

    Linux 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: Kubernetes was deployed with Kuberpray with GlusterFS as a container volume and Weave as its networking.

  • Others: 2 master 1 node setup. We have redeployed the entire setup and still get hit by the same issue.

I have posted this question on their issues page:

https://github.com/kubernetes/kubernetes/issues/68829

But no reply.

Logs from API:

[root@master0 manifests]# kubectl -n monitoring delete pod prometheus-core-59778c7987-bl2h4 --force --grace-period=0 -v9
I0919 13:53:08.770798   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.771440   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.772681   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780266   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780943   19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.781609   19973 loader.go:359] Config loaded from file /root/.kube/config
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
I0919 13:53:08.781876   19973 request.go:897] Request Body: {"gracePeriodSeconds":0,"propagationPolicy":"Foreground"}
I0919 13:53:08.781938   19973 round_trippers.go:386] curl -k -v -XDELETE  -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.798682   19973 round_trippers.go:405] DELETE https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 16 milliseconds
I0919 13:53:08.798702   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.798709   19973 round_trippers.go:414]     Content-Type: application/json
I0919 13:53:08.798714   19973 round_trippers.go:414]     Content-Length: 3199
I0919 13:53:08.798719   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.798758   19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
pod "prometheus-core-59778c7987-bl2h4" force deleted
I0919 13:53:08.798864   19973 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.801386   19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 2 milliseconds
I0919 13:53:08.801403   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.801409   19973 round_trippers.go:414]     Content-Type: application/json
I0919 13:53:08.801415   19973 round_trippers.go:414]     Content-Length: 3199
I0919 13:53:08.801420   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.801465   19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
I0919 13:53:08.801758   19973 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true'
I0919 13:53:08.803409   19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true 200 OK in 1 milliseconds
I0919 13:53:08.803424   19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.803430   19973 round_trippers.go:414]     Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.803436   19973 round_trippers.go:414]     Content-Type: application/json
-- Shinerrs
kubernetes

3 Answers

9/19/2018

After you issue the kubectl delete I would log into the nodes where the pods are running and debug with docker commands. (assuming your runtime is Docker)

docker logs <container-with-issue>
docker exec -it <container-with-with-issue> bash # maybe the application is hanging.

Are you mounting any volumes for Prometheus? It could be that it's trying to release an EBS volume and the AWS API is unresponsive.

Hope it helps!

-- Rico
Source: StackOverflow

9/19/2018

Some times a Kubernetes workers have problems like zombie process or kernel panics or IO waiting. But when you want to delete a pod that use Storage and have many IO/PS like Prometheus DB , your worker can not kill that pods.

I had same situation like you but on Container Linux without any cloud platform like AWS and Gcloud. i just rebooted my broke worker and after that delete them normally without --grace-period=0. --grace-period=0 is a very bad command when your nodes and pods are running without any problems.

workers can reboot when you use an K8S. This is a good fetcher of K8S.

For run Prometheus you should make some Prometheus with different config or use federation for scale Prometheus if you want to have a Monitoring System without IO problem .

-- omilun
Source: StackOverflow

9/20/2018

After some investigation and help from the Kubernetes community over on github. We found the solution. The answer is, in 1.11.0 there is a known bug in relation to this issue. after upgrading to 1.12.0 the issue was resolved. The issue is noted to be resolved in 1.11.1

Thanks to cduchesne https://github.com/kubernetes/kubernetes/issues/68829#issuecomment-422878108

-- Shinerrs
Source: StackOverflow