AWS has per node Pod IP restrictions, pods are stuck at ContainerCreating state

11/8/2019

As we all know, AWS has per node Pod IP restriction and kubernetes doesn't care this while scheduling, pods get scheduled in nodes where no pod IPs can be allocated and pods get stuck at ContainerCreating state as following:

Normal   Scheduled               114s                 default-scheduler                        Successfully assigned default/whoami-deployment-9f9c86c4f-r4flx to ip-192-168-15-248.ec2.internal
Warning  FailedCreatePodSandBox  111s                 kubelet, ip-192-168-15-248.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8d4b5f98f9b600ad9ec486f994fa2f9223d5224842df7f78802616f014b52970" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container
Normal   SandboxChanged          86s (x12 over 109s)  kubelet, ip-192-168-15-248.ec2.internal  Pod sandbox changed, it will be killed and re-created.
Warning  FailedCreatePodSandBox  61s (x4 over 76s)    kubelet, ip-192-168-15-248.ec2.internal  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e2a3c54ba7d9a33a45248f7c276f4a2d5b0c8ba6c3deb5184392156b35638553" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container

So I tried overcoming the issue by tainting nodes with key=value:NoSchedule, so that default scheduler doesn't schedule pods to the nodes which already reached pod IP limit and deleted all pods which were stuck at ContainerCreating state. I was hoping that it will make the scheduler not to schedule any more pods to tainted nodes and that's what happened but, since pods are not scheduled I was also hoping, cluster-autoscaler will scale ASG and my pods will run on new nodes and that's what didn't happen.

When I do describe pod I see:

Warning FailedScheduling 40s (x5 over 58s) default-scheduler 0/5 nodes are available: 5 node(s) had taints that the pod didn't tolerate.

Normal NotTriggerScaleUp 5s (x6 over 56s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

When I look at cluster-autoscaler logs I see:

I1108 16:30:47.521026 1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"whoami-deployment-9f9c86c4f-x5h4d", UID:"158cc806-0245-11ea-a67a-0efb4254edc4", APIVersion:"v1", ResourceVersion:"2483839", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

Now, I tried an alternative way to mark my nodes unschedulable by removing the above NoSchedule taint and patching nodes by:

kubectl patch nodes node1.internal -p '{"spec": {"unschedulable": true}}'

And this is the logs I see in cluster-autoscaler:

I1109 10:47:50.894680       1 static_autoscaler.go:138] Starting main loop
W1109 10:47:50.894719       1 static_autoscaler.go:562] Cluster has no ready nodes.
I1109 10:47:50.901157       1 event.go:209] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"7c949105-0153-11ea-9a39-12e5fc698b6e", APIVersion:"v1", ResourceVersion:"2629645", FieldPath:""}): type: 'Warning' reason: 'ClusterUnhealthy' Cluster has no ready nodes.

So, my idea of overcoming the issue made no sense. How shall I overcome this?

Kubernetes version: 1.14 Cluster Autoscaler: 1.14.6

Let me know if you guys need more details.

-- sudip
kubernetes

0 Answers