Kubernetes CoreDNS in CrashLoopBackOff on worker node

4/21/2021

I've searched CoreDns in CrashLoopBackOff but nothing has helped me through.

My Set

k8s - v1.20.2
CoreDns-1.7.0
Installed by kubespray with this one https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray

My Problem

CoreDNS pods on master Node are in a running state but on worker Node coreDns pods are in crashLoopBackOff state.

enter image description here

kubectl logs -f coredns-847f564ccf-msbvp -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

CoreDns container runs a command "/coredns -conf /etc/resolv.conf" for a while and then it is destroyed.

enter image description here

Here is Corefile

Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          prefer_udp
        }
        cache 30
        loop
        reload
        loadbalance
    }

And one of crashed pod's event

kubectl get event --namespace kube-system --field-selector involvedObject.name=coredns-847f564ccf-lqnxs
LAST SEEN   TYPE      REASON      OBJECT                         MESSAGE
4m55s       Warning   Unhealthy   pod/coredns-847f564ccf-lqnxs   Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
9m59s       Warning   BackOff     pod/coredns-847f564ccf-lqnxs   Back-off restarting failed container

And Here is CoreDns Description

Containers:
  coredns:
    Container ID:  docker://a174cb3a3800181d1c7b78831bfd37bbf69caf60a82051d6fb29b4b9deeacce9
    Image:         k8s.gcr.io/coredns:1.7.0
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 21 Apr 2021 21:51:44 +0900
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 21 Apr 2021 21:44:42 +0900
      Finished:     Wed, 21 Apr 2021 21:46:32 +0900
    Ready:          False
    Restart Count:  9943
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=0s timeout=5s period=10s #success=1 #failure=10
    Readiness:    http-get http://:8181/ready delay=0s timeout=5s period=10s #success=1 #failure=10
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qqhn6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node-role.kubernetes.io/control-plane:NoSchedule
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                       From     Message
  ----     ------     ----                      ----     -------
  Normal   Pulled     18m (x9940 over 30d)      kubelet  Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
  Warning  Unhealthy  8m37s (x99113 over 30d)   kubelet  Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    3m35s (x121901 over 30d)  kubelet  Back-off restarting failed container

At this point, any suggestion at all will be helpful

I found something weird. I test in node1, I can access Coredns pod in the node2, but I can not access Coredns pod in the node1. I use calico for cni

in node1, coredns1 - 1.1.1.1 in node2, coredns2 - 2.2.2.2

in node1.

  • access 1.1.1.1:8080/health -> timeout
  • access 2.2.2.2:8080/health -> ok

in node2.

  • access 1.1.1.1:8080/health -> ok
  • access 2.2.2.2:8080/health -> timeout
-- JovialCoding
coredns
kubernetes

1 Answer

5/23/2021

If Containerd, and Kubelet are under Proxy, please add private IP range: 10.0.0.0/8 into NO_PROXY configuration to make sure they can pull the images. E.g:

[root@dev-systrdemo301z phananhtuan01]# cat /etc/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=dev-proxy.prod.xx.local:8300"
Environment="HTTPS_PROXY=dev-proxy.prod.xx.local:8300"
Environment="NO_PROXY=localhost,127.0.0.0/8,100.67.253.157/24,10.0.0.0/8"

Please refer to this article.

-- Tuan Phan
Source: StackOverflow