I have setup a Kubernets cluster with one master (kube-master) and 2 slaves (kube-node-01 and kube-node-02)
All was running fine ... now after debian stretch -> buster upgrade my coredns pods are failing with CrashLoopBackOff
for some reason.
I did a kubectl describe
and the error is Readiness probe failed: HTTP probe failed with statuscode: 503
The Readiness url looks suspicious to me http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
... there is no hostname !? is that correct?
The Liveness
property also does not have a hostname.
All vm's are pingable from each other.
Any ideas?
I would start troubleshooting with kubelet agent verification on the master and worker nodes in order exclude any intercommunication issue within a cluster nodes whenever the rest core runtime Pods are up and running, as kubelet
is the main contributor for Liveness and Readiness probes.
systemctl status kubelet -l
journalctl -u kubelet
Mentioned in the question health check URLs are fine, as they are predefined in CoreDNS deployment per design.
Ensure that CNI plugin Pods are functioning and cluster overlay network intercepts requests from Pod to Pod communication as CoreDNS is very sensitive on any issue related to entire cluster networking.
In addition to @Hang Du answer about CoreDNS pods loopback issue, I encourage you to get more information regarding CoreDNS problems investigation in official k8s debug documentation.
I met similar issue when upgrade my host machine to ubuntu 18.04 which uses systemd-resolved as DNS server. The nameserver field in /etc/resolv.conf is using a local IP address 127.0.0.53 which would cause coreDNS failed to start.
You can take a look at the details from the following link. https://github.com/coredns/coredns/blob/master/plugin/loop/README.md#troubleshooting-loops-in-kubernetes-clusters