NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error

4/22/2020

I get this error after I start the worker node VMs(Kubernetes) from the AWS console. I am using PKS ( Pivotal Container Service)

network for pod "xxxxx": NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from 10.x.x.x/xx

I supppose that Flannel assigns a subnet lease to the workers in the cluster, which expires after 24 hours - and flannel.1 and cni0 /24 subnet no longer match, which causes this issue.

I also know a workaround:

bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld" 
bosh ssh -d worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db" 
bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit restart all"

However is there any permanent fix to this?

-- Dilip
amazon-web-services
flannel
kubernetes
pivotal-cloud-foundry
pivotaltracker

1 Answer

5/1/2020

TL;DR - recreate network

$ ip link set cni0 down
$ brctl delbr cni0  

Community solutions

It is a known issue

And there are some solutions to fix it.

Solution by filipenv is:

on master and slaves:

$ kubeadm reset
$ systemctl stop kubelet
$ systemctl stop docker
$ rm -rf /var/lib/cni/
$ rm -rf /var/lib/kubelet/*
$ rm -rf /etc/cni/
$ ifconfig cni0 down
$ ifconfig flannel.1 down
$ ifconfig docker0 down

you may need to manually umount filesystems from /var/lib/kubelet before calling rm on that dir) After doing that I started docker and kubelet back again and restarted the kubeadm process

aysark: and kubernetes-handbook in a recipe for Pod stuck in Waiting or ContainerCreating both recommend

$ ip link set cni0 down
$ brctl delbr cni0  

Flannel's KB article

And there is an article in Flannel's KB: PKS Flannel network gets out of sync with docker bridge network (cni0)

WA1

WA1 is just like yours:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <deployment_name> worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db"
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart all"

WA2

If WA1 didn't help, KB recommends:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <> worker -c "ifconfig | grep -A 1 flannel"
    On a master node get access to etcd using the following KB 
    On a master node run `etcdctlv2 ls /coreos.com/network/subnets/`
    Remove all the worker subnet leases from etcd by running `etcdctlv2 rm /coreos.com/network/subnets/<worker_subnet>;` for each of the worker subnets from point 2 above.
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart flanneld"
-- Yasen
Source: StackOverflow