I have issues with CoreDNS on some nodes are in Crashloopback state due to error trying to reach the kubernetes internal service.
This is a new K8s cluster deployed using Kubespray, the network layer is Weave with Kubernetes version 1.12.5 on Openstack. I've already tested the connection to the endpoints and have no issue reaching to 10.2.70.14:6443 for example. But telnet from the pods to 10.233.0.1:443 is failing.
Thanks in advance for the help
kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.233.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.2.70.14:6443,10.2.70.18:6443,10.2.70.27:6443 + 2 more...
Session Affinity: None
Events: <none>
And from CoreDNS logs:
E0415 17:47:05.453762 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:05.456909 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:06.453258 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
Also, checking out the logs of kube-proxy from one of the problematic nodes revealed the following errors:
I0415 19:14:32.162909 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.36:6443
I0415 19:14:32.162979 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.36:6443: 1 ActiveConn, 0 InactiveConn
I0415 19:14:32.162989 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.18:6443
I0415 19:14:32.163017 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.18:6443: 1 ActiveConn, 0 InactiveConn
E0415 19:14:32.215707 1 proxier.go:430] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
I had exactly the same problem, and it turned out that my kubespray config was wrong. Especially the nginx ingress setting ingress_nginx_host_network
As it turns our you have to set ingress_nginx_host_network: true
(defaults to false)
If you do not want to rerun the whole kubespray script, edit the nginx ingress deamon set
$ kubectl -n ingress-nginx edit ds ingress-nginx-controller
--report-node-internal-ip-address
to the commandline:spec:
container:
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/ingress-nginx
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --annotations-prefix=nginx.ingress.kubernetes.io
- --report-node-internal-ip-address # <- new
serviceAccountName: ingress-nginx
:serviceAccountName: ingress-nginx
hostNetwork: true # <- new
dnsPolicy: ClusterFirstWithHostNet # <- new
Then save and quit :wq
, check the pod status kubectl get pods --all-namespaces
.
Source: https://github.com/kubernetes-sigs/kubespray/issues/4357