I am deploying a kubernetes cluster using kubespray. I changed the network pluggin from calico to cilium.
Unfortunatly some of the cilium pods are stuck in CrashLoopBackOff.
kubectl --namespace kube-system get pods --selector k8s-app=cilium --sort-by='.status.containerStatuses.restartCount' -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cilium-2gmwm 1/1 Running 0 14m 10.10.3.102 nodemaster1 <none> <none> cilium-9ccdp 1/1 Running 0 14m 10.10.3.110 node6 <none> <none> cilium-c9nh6 1/1 Running 0 14m 10.10.3.107 node3 <none> <none> cilium-r9w4z 0/1 CrashLoopBackOff 6 14m 10.10.3.109 node5 <none> <none> cilium-f8z2q 1/1 Running 0 14m 10.10.3.105 node1 <none> <none> cilium-d96cd 0/1 CrashLoopBackOff 7 14m 10.10.3.106 node2 <none> <none> cilium-jgmcf 0/1 CrashLoopBackOff 7 14m 10.10.3.103 nodemaster2 <none> <none> cilium-9zqnr 0/1 CrashLoopBackOff 7 14m 10.10.3.108 node4 <none> <none> cilium-llt9p 0/1 CrashLoopBackOff 7 14m 10.10.3.104 nodemaster3 <none> <none>
When checking the logs of the crashed pods I can see this fatal error message :
level=fatal msg="The allocation CIDR is different from the previous cilium instance. This error is most likely caused by a temporary network disruption to the kube-apiserver that prevent Cilium from retrieve the node's IPv4/IPv6 allocation range. If you believe the allocation range is supposed to be different you need to clean up all Cilium state with the `cilium cleanup` command on this node. Be aware this will cause network disruption for all existing containers managed by Cilium running on this node and you will have to restart them." error="Unable to allocate internal IPv4 node IP 10.233.71.1: provided IP is not in the valid range. The range of valid IPs is 10.233.70.0/24." subsys=daemon
It seems that the node's IP (10.233.71.1 in this case) is not respecting the valid range of 10.233.70.0/24.
I tried to modify the main.yaml file of kubespray to change the subnet but my multiple attempts only made the number of crash go up and down...
For instance this run I tried with :
kube_service_addresses: 10.233.0.0/17 kube_pods_subnet: 10.233.128.0/17 kube_network_node_prefix: 18
As you can see it did not work. If you have any idears... :-)
I finally fixed the problem with the help of the Cilium devs !
You have to set the key
clean-cilium-state from false to true in the kubespray file
After the deployment you have to revert this boolean. To do so, execute
kubectl edit configmap cilium-config -n kube-system and change back the key
clean-cilium-state from true to false.
Finally you have to kill the cilium pods.
List the pods :
kubectl get pods -n kube-system
Kill the pods :
kubectl delete pods cilium-xxx cilium-xxx ...
This is now listed as an issue on the Cilium repo