I have a cluster of 3 nodes running Kubernetes 1.6.1, each has 2 CPU and 4G RAM.
I am constantly redeploying my application with the same Docker tag by updating pod template hash by replacing environment variable value that is passed to the container.
sed "s/THIS_STRING_IS_REPLACED_DURING_BUILD/$(date)/g" nginx-deployment.yml | kubectl replace -f -
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
env:
- name: FOR_GODS_SAKE_PLEASE_REDEPLOY
value: 'THIS_STRING_IS_REPLACED_DURING_BUILD'
If I do this for a few hundred times, I can't redeploy any more - new pods are in Pending state. kubectl get events
produces the following:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 50s 379 default-scheduler Warning
FailedScheduling No nodes are available that match all of the following predicates:: Insufficient pods (3).
At the same time I can see about 200 Exited nginx containers on every Kube node.
Looking in kube-controller-manager logs I can see that PodGC is trying to delete some pods, but they are not found.
I0516 12:53:41.137311 1 gc_controller.go:175] Found unscheduled terminating Pod nginx-deployment-2927112463-qczvv not assigned to any Node. Deleting.
I0516 12:53:41.137320 1 gc_controller.go:62] PodGC is force deleting Pod: default:nginx-deployment-2927112463-qczvv
E0516 12:53:41.190592 1 gc_controller.go:177] pods "nginx-deployment-2927112463-qczvv" not found
I0516 12:53:41.195020 1 gc_controller.go:175] Found unscheduled terminating Pod nginx-deployment-3265736979-jrpzb not assigned to any Node. Deleting.
I0516 12:53:41.195048 1 gc_controller.go:62] PodGC is force deleting Pod: default:nginx-deployment-3265736979-jrpzb
E0516 12:53:41.238307 1 gc_controller.go:177] pods "nginx-deployment-3265736979-jrpzb" not found
Is there anything I can do to prevent that from happening?
I think you have run out of all the resource the your nodes. The scheduler can not find any node to schedule the pod. Since the pod is not scheduled to any node, so the PodGC can't remove your pod.
I think you should double check why you have run out of all your resource.
Kubernetes allows you to tweak the garbage collection flags of kubelet. This can be done via changing the flags --maximum-dead-containers
or --maximum-dead-containers-per-container
. Read more about it in docs here: