Weave CrashLoopBackOff on fresh HA Cluster

4/17/2020

I create a HA clusters with kubeadm by following this guides: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ https://medium.com/faun/configuring-ha-kubernetes-cluster-on-bare-metal-servers-with-kubeadm-1-2-1e79f0f7857b

I've already have the ETCD nodes up and runnig, the APIserver through the HAproxy and keepalive running. And 1 master node running with weave-net network.

I use this subnets

networking:
  podSubnet: 192.168.240.0/22
  serviceSubnet: 192.168.244.0/22

But when I join the second master node to the cluster, the weave pod created got CrashLoopBackOff.

I run the weave-net plugin with this line:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.240.0/21"

I also found that the /etc/cni/net.d isn't created by kubelet when apply weave conf.

The Master Nodes

kubectl get nodes -o wide
NAME                          STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kubemaster01                  Ready    master   17h   v1.18.1   192.168.129.137   <none>        Ubuntu 18.04.4 LTS   4.15.0-96-generic   docker://19.3.8
kubemaster02                  Ready    master   83m   v1.18.1   192.168.129.138   <none>        Ubuntu 18.04.4 LTS   4.15.0-91-generic   docker://19.3.8

The Pods

oot@kubemaster01:~# kubectl get pods,svc --all-namespaces  -o wide
NAMESPACE     NAME                                                      READY   STATUS             RESTARTS   AGE    IP                NODE                          NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-66bff467f8-kh4mh                              0/1     Running            0          18h    192.168.240.3     kubemaster01                           <none>           <none>
kube-system   pod/coredns-66bff467f8-xhzjk                              0/1     Running            0          18h    192.168.240.2     kubemaster01                           <none>           <none>
kube-system   pod/kube-apiserver-kubemaster01                           1/1     Running            0          16h    192.168.129.137   kubemaster01                           <none>           <none>
kube-system   pod/kube-apiserver-kubemaster02                           1/1     Running            0          104m   192.168.129.138   kubemaster02                           <none>           <none>
kube-system   pod/kube-controller-manager-kubemaster01                  1/1     Running            0          16h    192.168.129.137   kubemaster01                           <none>           <none>
kube-system   pod/kube-controller-manager-kubemaster02                  1/1     Running            0          104m   192.168.129.138   kubemaster02                           <none>           <none>
kube-system   pod/kube-proxy-sct5x                                      1/1     Running            0          18h    192.168.129.137   kubemaster01                           <none>           <none>
kube-system   pod/kube-proxy-tsr65                                      1/1     Running            0          104m   192.168.129.138   kubemaster02                           <none>           <none>
kube-system   pod/kube-scheduler-kubemaster01                           1/1     Running            2          18h    192.168.129.137   kubemaster01                           <none>           <none>
kube-system   pod/kube-scheduler-kubemaster02                           1/1     Running            0          104m   192.168.129.138   kubemaster02                           <none>           <none>
kube-system   pod/weave-net-4zdg6                                       2/2     Running            0          3h     192.168.129.137   kubemaster01                           <none>           <none>
kube-system   pod/weave-net-bf8mq                                       1/2     CrashLoopBackOff   38         104m   192.168.129.138   kubemaster02                           <none>           <none>

NAMESPACE     NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default       service/kubernetes   ClusterIP   192.168.244.1    <none>        443/TCP                  20h   <none>
kube-system   service/kube-dns     ClusterIP   192.168.244.10   <none>        53/UDP,53/TCP,9153/TCP   18h   k8s-app=kube-dns

IP Routes in Master Nodes

root@kubemaster01:~# ip r
default via 192.168.128.1 dev ens3 proto static 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.137 
192.168.240.0/21 dev weave proto kernel scope link src 192.168.240.1 

root@kubemaster02:~# ip r
default via 192.168.128.1 dev ens3 proto static 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.138 

Description of the weave pod running on the second master node

root@kubemaster01:~# kubectl describe pod/weave-net-bf8mq -n kube-system
Name:                 weave-net-bf8mq
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 kubemaster02./192.168.129.138
Start Time:           Fri, 17 Apr 2020 12:28:09 -0300
Labels:               controller-revision-hash=79478b764c
                      name=weave-net
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.129.138
IPs:
  IP:           192.168.129.138
Controlled By:  DaemonSet/weave-net
Containers:
  weave:
    Container ID:  docker://93bff012aaebb34dc338001bf73798b5eeefe32a4d50b82731b0ef003c63c786
    Image:         docker.io/weaveworks/weave-kube:2.6.2
    Image ID:      docker-pullable://weaveworks/weave-kube@sha256:a1f58e75f24f02e1c2fa2a95b9e55a1b94930f455e75bd5f4799e1a55671971f
    Port:          <none>
    Host Port:     <none>
    Command:
      /home/weave/launch.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 17 Apr 2020 14:15:59 -0300
      Finished:     Fri, 17 Apr 2020 14:16:29 -0300
    Ready:          False
    Restart Count:  39
    Requests:
      cpu:      10m
    Readiness:  http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      HOSTNAME:        (v1:spec.nodeName)
      IPALLOC_RANGE:  192.168.240.0/21
    Mounts:
      /host/etc from cni-conf (rw)
      /host/home from cni-bin2 (rw)
      /host/opt from cni-bin (rw)
      /host/var/lib/dbus from dbus (rw)
      /lib/modules from lib-modules (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
      /weavedb from weavedb (rw)
  weave-npc:
    Container ID:   docker://4de9116cae90cf3f6d59279dd1531938b102adcdd1b76464e5bbe2f2b013b060
    Image:          docker.io/weaveworks/weave-npc:2.6.2
    Image ID:       docker-pullable://weaveworks/weave-npc@sha256:5694b0b77003780333ccd1fc79810469434779cd86e926a17675cc5b70470459
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 17 Apr 2020 12:28:24 -0300
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:  10m
    Environment:
      HOSTNAME:   (v1:spec.nodeName)
    Mounts:
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  weavedb:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/weave
    HostPathType:  
  cni-bin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt
    HostPathType:  
  cni-bin2:
    Type:          HostPath (bare host directory volume)
    Path:          /home
    HostPathType:  
  cni-conf:
    Type:          HostPath (bare host directory volume)
    Path:          /etc
    HostPathType:  
  dbus:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/dbus
    HostPathType:  
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  weave-net-token-xp46t:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  weave-net-token-xp46t
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 :NoExecute
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason   Age                  From                                  Message
  ----     ------   ----                 ----                                  -------
  Normal   Pulled   11m (x17 over 81m)   kubelet, kubemaster02.   Container image "docker.io/weaveworks/weave-kube:2.6.2" already present on machine
  Warning  BackOff  85s (x330 over 81m)  kubelet, kubemaster02.   Back-off restarting failed container

The logs files complains about timeout, but that is because there is no network running.

root@kubemaster02:~# kubectl logs weave-net-bf8mq -name weave -n kube-system
FATA: 2020/04/17 17:22:04.386233 [kube-peers] Could not get peers: Get https://192.168.244.1:443/api/v1/nodes: dial tcp 192.168.244.1:443: i/o timeout
Failed to get peers
root@kubemaster02:~# kubectl logs weave-net-bf8mq -name weave-npc -n kube-system | more
INFO: 2020/04/17 15:28:24.851287 Starting Weaveworks NPC 2.6.2; node name "kubemaster02"
INFO: 2020/04/17 15:28:24.851469 Serving /metrics on :6781
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `NFLOG'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `BASE'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `PCAP'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:981 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
WARNING: scheduler configuration failed: Function not implemented
DEBU: 2020/04/17 15:28:24.887619 Got list of ipsets: []
ERROR: logging before flag.Parse: E0417 15:28:54.923915   19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://192.168.244.1:443/api/v1/pods?limit=500&resourceVersion=0: dial
 tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.923895   19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:322: Failed to list *v1.NetworkPolicy: Get https://192.168.244.1:443/apis/networking.k8s.io/v1/networkpo
licies?limit=500&resourceVersion=0: dial tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.924071   19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:320: Failed to list *v1.Namespace: Get https://192.168.244.1:443/api/v1/namespaces?limit=500&resourceVer
sion=0: dial tcp 192.168.244.1:443: i/o timeout

Any advice or suggestions?

Regards.

-- acancio
high-availability
kubernetes
weave

1 Answer

4/23/2020

The error was in a misconfiguration with CRI-O as CRI runtime. Follow this installation guide corrects the issue.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/

-- acancio
Source: StackOverflow