Kube flannel in CrashLoopBackOff status

8/30/2018

We just start to create our cluster on kubernetes.

Now we try to deploy tiller but we have en error:

NetworkPlugin cni failed to set up pod "tiller-deploy-64c9d747bd-br9j7_kube-system" network: open /run/flannel/subnet.env: no such file or directory

After that I call:

kubectl get pods --all-namespaces -o wide

And got response:

NAMESPACE     NAME                                   READY     STATUS              RESTARTS   AGE       IP              NODE          NOMINATED NODE
kube-system   coredns-78fcdf6894-ksdvt               1/1       Running             2          7d        192.168.0.4     kube-master   <none>
kube-system   coredns-78fcdf6894-p4l9q               1/1       Running             2          7d        192.168.0.5     kube-master   <none>
kube-system   etcd-kube-master                       1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-apiserver-kube-master             1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-controller-manager-kube-master    1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-flannel-ds-amd64-42rl7            0/1       CrashLoopBackOff    2135       7d        10.168.209.17   node5         <none>
kube-system   kube-flannel-ds-amd64-5fx2p            0/1       CrashLoopBackOff    2164       7d        10.168.209.14   node2         <none>
kube-system   kube-flannel-ds-amd64-6bw5g            0/1       CrashLoopBackOff    2166       7d        10.168.209.15   node3         <none>
kube-system   kube-flannel-ds-amd64-hm826            1/1       Running             1          7d        10.168.209.20   kube-master   <none>
kube-system   kube-flannel-ds-amd64-thjps            0/1       CrashLoopBackOff    2160       7d        10.168.209.16   node4         <none>
kube-system   kube-flannel-ds-amd64-w99ch            0/1       CrashLoopBackOff    2166       7d        10.168.209.13   node1         <none>
kube-system   kube-proxy-d6v2n                       1/1       Running             0          7d        10.168.209.13   node1         <none>
kube-system   kube-proxy-lcckg                       1/1       Running             0          7d        10.168.209.16   node4         <none>
kube-system   kube-proxy-pgblx                       1/1       Running             1          7d        10.168.209.20   kube-master   <none>
kube-system   kube-proxy-rnqq5                       1/1       Running             0          7d        10.168.209.14   node2         <none>
kube-system   kube-proxy-wc959                       1/1       Running             0          7d        10.168.209.15   node3         <none>
kube-system   kube-proxy-wfqqs                       1/1       Running             0          7d        10.168.209.17   node5         <none>
kube-system   kube-scheduler-kube-master             1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kubernetes-dashboard-6948bdb78-97qcq   0/1       ContainerCreating   0          7d        <none>          node5         <none>
kube-system   tiller-deploy-64c9d747bd-br9j7         0/1       ContainerCreating   0          45m       <none>          node4         <none>

We have some flannel pods in CrashLoopBackOff status. For example kube-flannel-ds-amd64-42rl7.

When I call:

kubectl describe pod -n kube-system kube-flannel-ds-amd64-42rl7

I've got status Running:

Name:               kube-flannel-ds-amd64-42rl7
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               node5/10.168.209.17
Start Time:         Wed, 22 Aug 2018 16:47:10 +0300
Labels:             app=flannel
                    controller-revision-hash=911701653
                    pod-template-generation=1
                    tier=node
Annotations:        <none>
Status:             Running
IP:                 10.168.209.17
Controlled By:      DaemonSet/kube-flannel-ds-amd64
Init Containers:
  install-cni:
    Container ID:  docker://eb7ee47459a54d401969b1770ff45b39dc5768b0627eec79e189249790270169
    Image:         quay.io/coreos/flannel:v0.10.0-amd64
    Image ID:      docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 22 Aug 2018 16:47:24 +0300
      Finished:     Wed, 22 Aug 2018 16:47:24 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Containers:
  kube-flannel:
    Container ID:  docker://521b457c648baf10f01e26dd867b8628c0f0a0cc0ea416731de658e67628d54e
    Image:         quay.io/coreos/flannel:v0.10.0-amd64
    Image ID:      docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 30 Aug 2018 10:15:04 +0300
      Finished:     Thu, 30 Aug 2018 10:15:08 +0300
    Ready:          False
    Restart Count:  2136
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:       kube-flannel-ds-amd64-42rl7 (v1:metadata.name)
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run from run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run
    HostPathType:
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  flannel-token-9wmch:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flannel-token-9wmch
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  beta.kubernetes.io/arch=amd64
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                  From            Message
  ----     ------   ----                 ----            -------
  Normal   Pulled   51m (x2128 over 7d)  kubelet, node5  Container image "quay.io/coreos/flannel:v0.10.0-amd64" already present on machine
  Warning  BackOff  1m (x48936 over 7d)  kubelet, node5  Back-off restarting failed container

here kube-controller-manager.yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --address=127.0.0.1
    - --allocate-node-cidrs=true
    - --cluster-cidr=192.168.0.0/24
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --node-cidr-mask-size=24
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --use-service-account-credentials=true
    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10252
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-controller-manager
    resources:
      requests:
        cpu: 200m
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/kubernetes/controller-manager.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      name: flexvolume-dir
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/kubernetes/controller-manager.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      type: DirectoryOrCreate
    name: flexvolume-dir
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
status: {}

OS is CentOS Linux release 7.5.1804

logs from one of pods:

# kubectl logs --namespace kube-system kube-flannel-ds-amd64-5fx2p

main.go:475] Determining IP address of default interface
main.go:488] Using interface with name eth0 and address 10.168.209.14
main.go:505] Defaulting external address to interface address (10.168.209.14)
kube.go:131] Waiting 10m0s for node controller to sync
kube.go:294] Starting kube subnet manager
kube.go:138] Node controller sync successful
main.go:235] Created subnet manager: Kubernetes Subnet Manager - node2
main.go:238] Installing signal handlers
main.go:353] Found network config - Backend type: vxlan
vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
main.go:280] Error registering network: failed to acquire lease: node "node2" pod cidr not assigned
main.go:333] Stopping shutdownHandler...

Where error is?

-- Alexey Vashchenkov
flannel
kubernetes

3 Answers

4/17/2019

For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.

-- abdelkhaliq bouharaoua
Source: StackOverflow

10/30/2019

Try this:

Failed to acquire lease simply means, the pod didn't get the podCIDR. Happened with me as well although the manifest on master-node says podCIDR true but still it wasn't working and funnel going in crashbackloop. This is what i did to fix it.

From the master-node, first find out your funnel CIDR

sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep -i cluster-cidr

Output:

- --cluster-cidr=172.168.10.0/24

Then run the following from the master node:

kubectl patch node slave-node-1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}'

where, slave-node-1 is your node where acquire lease is failing podCIDR is the cidr that you found in previous command

Hope this helps.

-- Pankaj Pande
Source: StackOverflow

9/8/2019

I had a similar problem. I did the following steps to make it work:

  • Delete the nodes from the master by kubeadm reset on the worker node.

  • Clear the iptables rules by iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X.

  • Clerar the config file by rm -rf $HOME/.kube/config.

  • Reboot the worker node.

  • Disable the Swap on the worker node by swapoff -a.

  • Join the master node, again.

-- vahid-dan
Source: StackOverflow