I'm working on installing a three node kubernetes cluster on a CentOS 7 with flannel for a some time, however the CoreDNS pods cannot connect to API server and constantly restarting.
The reference HowTo document I followed is here.
firewalld
,br_netfilter
, bridge-nf-call-iptables
,10.244.0.0/16
),10.244.0.0/24
networks from node to node.kubectl
even if the container is on a different node.CoreDNS pods report that they cannot connect to API server with error:
Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
I cannot see 10.96.0.0
routes in routing tables:
default via 172.16.0.1 dev eth0 proto static metric 100
10.1.0.0/24 dev eth1 proto kernel scope link src 10.1.0.202 metric 101
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev docker0 proto kernel scope link src 10.244.1.1
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.202 metric 100
kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16
.1.11-3
and 1.12-0
CentOS7 packages.1.11.3-0
.kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16
, since the server has another external IP which cannot be accessed via other hosts, and Kubernetes tends to select that IP as API Server IP. --pod-network-cidr
is mandated by flannel.Resulting iptables -L
output after initialization with no joined nodes
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
DOCKER-USER all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (1 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-FORWARD (1 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
Chain KUBE-SERVICES (1 references)
target prot opt source destination
REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
Looks like API Server is deployed as it should
$ kubectl get svc kubernetes -o=yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2018-10-25T06:58:46Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "6"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: 6b3e4099-d823-11e8-8264-a6f3f1f622f3
spec:
clusterIP: 10.96.0.1
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
Then I've applied flannel network pod with
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
As soon as I apply the flannel network, CoreDNS pods start and start to give the same error:
Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500\u0026resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
I've found out that flanneld
is using the wrong network interface, and changed it in the kube-flannel.yml
file before deployment. However the outcome is still the same.
Any help is greatly appreciated.
I met this before. The Firewalld had opened the port 6443 to my real LAN IPs, but it still disables others, so I tried to shut down the Firewall via the CMD :
systemctl stop firewalld
It works and all exceptions that coming from kubectl logs were gone, so the root cause is the firewall rules of your linux servers.
This is basically saying that your coredns pod cannot talk to the kube-apiserver. The kube-apiserver is exposed in the pod through these environment variables: KUBERNETES_SERVICE_HOST=10.96.0.1
and KUBERNETES_SERVICE_PORT_HTTPS=443
I believe that the routes that you posted are routes on the host since this is what you get when you run ip routes
in pod container:
root@xxxx-xxxxxxxxxx-xxxxx:/# ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
root@xxxx-xxxxxxxxxx-xxxxx:/#
In any case, you wouldn't see 10.96.0.1
since that's exposed in the cluster using iptables. So what is that address? It happens that is a service
in the default namespace called kubernetes
. That service's ClusterIP
is 10.96.0.1
and it's listening on port 443
, it also maps to targetPort
6443
which is where your kube-apiserver is running.
Since you can deploy pods, etc. It seems like the kube-apiserver is not down and that's not your problem. So most likely you are missing that service (or there's some iptable rule not allowing you to connect to it). You can see it here, for example:
$ kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 92d
The full output is something like this:
$ kubectl get svc kubernetes -o=yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2018-07-23T21:10:22Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "24"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
spec:
clusterIP: 10.96.0.1
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
So if you are missing it, you can create it like this:
cat <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
spec:
clusterIP: 10.96.0.1
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: None
type: ClusterIP
EOF | kubectl apply -f -
I've solved the problem. The cause is a mixture of inexperience, lack of documentation and some old, no-longer-correct information.
The guy who will be using the installation told me that Docker's bridge needs to be in the same subnet with the Flannel network, hence I edited Docker's bridge network.
However, when Kubernetes started to use CNI, this requirement not only became unnecessary, but plain wrong. Having both cni0
and docker0
on the same network with same IP address always felt wrong, but since I'm a complete beginner in Kubernetes, I ignored my hunch.
As a result, I reset Docker's network to its default, tore down the cluster and rebuilt it. Now everything is working as it should.
TL;DR: Never, ever touch Docker's network parameters if you are setting up a recent Kubernetes release. Just install Docker, init the Kubernetes and deploy Flannel. Kubernetes and CNI will take care of container to Flannel transport.