After rebooting nodes, pods stuck in containerCreating state due to insufficient weave IP

7/31/2019

I have 3 node Kubernetes cluster on 1.11 deployed with kubeadm and weave(CNI) running of version 2.5.1. I am providing weave CIDR of IP range of 128 IP's. After two reboot of nodes some of the pods stuck in containerCreating state.

Once you run kubectl describe pod <pod_name> you will see following errors:

Events:
Type     Reason                  Age                From                Message
----     ------                  ----               ----                
-------
Normal   SandboxChanged          20m (x20 over 1h)  kubelet, 10.0.1.63  Pod sandbox changed, it will be killed and re-created.
Warning  FailedCreatePodSandBox  30s (x25 over 1h)  kubelet, 10.0.1.63  Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded

If I check how many containers are running and how many IP address are allocated to those, I can see 24 containers:

[root@ip-10-0-1-63 centos]# weave ps | wc -l
26

The number of total IP's to weave at that node is 42.

[root@ip-10-0-1-212 centos]# kubectl exec -n kube-system -it weave-net-6x4cp -- /home/weave/weave --local status ipam
Defaulting container name to weave.
Use 'kubectl describe pod/weave-net-6x4cp -n kube-system' to see all of the containers in this pod.
6e:0d:f3:d7:f5:49(10.0.1.63)                42 IPs (32.8% of total) (42 active)
7a:24:6f:3c:1b:be(10.0.1.212)               40 IPs (31.2% of total) 
ee:00:d4:9f:9d:79(10.0.1.43)                46 IPs (35.9% of total) 

You can see all 42 IP's are active so no more IP's are available to allocate to new containers. But out of 42 only 26 are actually allocated to containers, I am not sure where are remaining IP's. It is happening on all three nodes.

Here is the output of weave status for your reference:

[root@ip-10-0-1-212 centos]# weave status

    Version: 2.5.1 (version 2.5.2 available - please upgrade!)

    Service: router
    Protocol: weave 1..2
       Name: 7a:24:6f:3c:1b:be(10.0.1.212)
    Encryption: disabled
PeerDiscovery: enabled
    Targets: 3
Connections: 3 (2 established, 1 failed)
      Peers: 3 (with 6 established connections)
TrustedSubnets: none

    Service: ipam
     Status: waiting for IP(s) to become available
      Range: 192.168.13.0/25
DefaultSubnet: 192.168.13.0/25

If you need anymore information, I would happy to provide. Any Clue?

-- Prafull Ladha
cni
kubeadm
kubernetes
weave

3 Answers

8/26/2019

In case if you're weave IP's are exhausted and some of the IP's are not released after reboot. You can delete the file /var/lib/weave/weave-netdata.db and restart the weave pods.

For my case, I have added a systemd script which on every reboot or shutdown of the system removes the /var/lib/weave/weave-netdata.db file and Once system comes up it allocates new Ip's to all the pods and the weave IP exhaust were never seen again.

Posting this here in hope someone else will find it useful for their use case.

-- Prafull Ladha
Source: StackOverflow

8/1/2019

I guess that 16 IP's have reserved for Pods reuse purpose. These are the maximum pods per node based on CIDR ranges.

  Maximum Pods per Node CIDR Range per Node
  8                     /28
  9 to 16               /27
  17 to 32              /26
  33 to 64              /25
  65 to 110             /24
-- Subramanian Manickam
Source: StackOverflow

7/31/2019

Not sure if we have the same problem. But before i reboot a node. I need to drain it first. So, all pods in that nodes will be evicted. We are safe to reboot the node. After that node is up. You need to uncordon again. The node will be available to scheduling pod again.

My reference https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/

-- Nicky Puff
Source: StackOverflow