Calico pod Readiness probe and Liveness probe always failed in Kubernetes1.15.4

10/17/2019

I hit the issue after I had upgraded rancher from v2.2.8 to v2.3. The cluster was deployed by using rke v0.3.0 . The first issue I got was Readiness probe failed: HTTP probe failed with statuscode: 503 the logs below and there is an issue https://github.com/rancher/rancher/issues/23430

kube-system    Warning    Unhealthy    canal-s6nwj    Readiness probe failed: HTTP probe failed with statuscode: 503    31 minutes ago
kube-system    Warning    Unhealthy    canal-s8q9c    Readiness probe failed: HTTP probe failed with statuscode: 503    35 minutes ago
kube-system    Warning    Unhealthy    canal-z4h9h    Readiness probe failed: HTTP probe failed with statuscode: 503    35 minutes ago
kube-system    Warning    Unhealthy    canal-55f6l    Readiness probe failed: HTTP probe failed with statuscode: 503    40 minutes ago

But after I follow the solution and add the crd.yml which was mentioned in the solution. I have been getting the error logs below

kube-system    Warning    Unhealthy    canal-8grhd    Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    a minute ago
kube-system    Warning    Unhealthy    canal-q2j9v    Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    3 minutes ago
kube-system    Warning    Unhealthy    canal-tz42g    Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    3 minutes ago
kube-system    Warning    Unhealthy    canal-5svn7    Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    3 minutes ago
kube-system    Warning    Unhealthy    canal-v7wmv    Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    4 minutes ago
kube-system    Warning    Unhealthy    canal-8grhd    Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    6 minutes ago
kube-system    Warning    Unhealthy    canal-q2j9v    Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    8 minutes ago
kube-system    Warning    Unhealthy    canal-tz42g    Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    8 minutes ago
kube-system    Warning    Unhealthy    canal-v7wmv    Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    9 minutes ago
kube-system    Warning    Unhealthy    canal-5svn7    Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)    13 minutes ago

The canal pods' yaml

Name:           canal-q2j9v
Namespace:      kube-system
Priority:       0
Node:           woker2/10.16.18.25
Start Time:     Wed, 16 Oct 2019 19:05:02 +0800
Labels:         controller-revision-hash=cd754f475
                k8s-app=canal
                pod-template-generation=2
Annotations:    scheduler.alpha.kubernetes.io/critical-pod:
Status:         Running
IP:             10.16.18.25
Controlled By:  DaemonSet/canal
Init Containers:
  install-cni:
    Container ID:  docker://262db0783d8b140e45d47faf8bdf2d1d6bd3f2d858d2c9d7985e16bf1a8f0f4d
    Image:         rancher/calico-cni:v3.7.4
    Image ID:      docker-pullable://rancher/calico-cni@sha256:5dc320eece42a8a1184bc5633e8779dcdd06b8a3ac010eefc93a9e38859b235a
    Port:          <none>
    Host Port:     <none>
    Command:
      /install-cni.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 16 Oct 2019 19:05:03 +0800
      Finished:     Wed, 16 Oct 2019 19:05:04 +0800
    Ready:          True
    Restart Count:  0
    Environment:
      CNI_CONF_NAME:         10-canal.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'canal-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from canal-token-pzx9p (ro)
Containers:
  calico-node:
    Container ID:   docker://df955a12ef388c86828cbc6de7b9587f45bf3654578ad21789fbb1c16f38db8f
    Image:          rancher/calico-node:v3.7.4
    Image ID:       docker-pullable://rancher/calico-node@sha256:709c559e53021355a19efdb57981bebddd96f35628dc8b49a5f9af8561d8497c
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 16 Oct 2019 19:05:04 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      250m
    Liveness:   http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  http-get http://localhost:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          none
      CLUSTER_TYPE:                       k8s,canal
      FELIX_IPTABLESREFRESHINTERVAL:      60
      IP:
      CALICO_IPV4POOL_CIDR:               192.168.0.0/16
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGFILEPATH:                  none
      FELIX_LOGSEVERITYSYS:
      FELIX_LOGSEVERITYSCREEN:            Warning
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from canal-token-pzx9p (ro)
  kube-flannel:
    Container ID:  docker://97d6b9a9e8e5736535182559660cd3c291fb748a0dd3f121eb98afa86817d622
    Image:         rancher/coreos-flannel:v0.11.0
    Image ID:      docker-pullable://rancher/coreos-flannel@sha256:bd76b84c74ad70368a2341c2402841b75950df881388e43fc2aca000c546653a
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
      --ip-masq
      --kube-subnet-mgr
    State:          Running
      Started:      Wed, 16 Oct 2019 19:05:05 +0800
    Ready:          True
    Restart Count:  0
    Environment:
      POD_NAME:          canal-q2j9v (v1:metadata.name)
      POD_NAMESPACE:     kube-system (v1:metadata.namespace)
      FLANNELD_IFACE:    <set to the key 'canal_iface' of config map 'canal-config'>  Optional: false
      FLANNELD_IP_MASQ:  <set to the key 'masquerade' of config map 'canal-config'>   Optional: false
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from canal-token-pzx9p (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      canal-config
    Optional:  false
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  canal-token-pzx9p:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  canal-token-pzx9p
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 :NoExecute
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                    From             Message
  ----     ------     ----                   ----             -------
  Warning  Unhealthy  3m1s (x243 over 19h)   kubelet, woker2  Liveness probe failed: Get http://localhost:9099/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  2m58s (x419 over 21h)  kubelet, woker2  Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Is there anyone who hit this issue?

-- Aisuko
calico
kubernetes
rancher

0 Answers