I have been following steps provided at https://kubernetes.io/docs/setup/independent/high-availability/ to bring up an HA cluster. I am using CoreOS nodes (VERSION=1688.5.3) and Kubernetes version v1.10.
I have followed the option of running all three etcd on the master nodes. For the Load Balancer, I have used a containerized keepalived as found at https://github.com/alterway/docker-keepalived. The keepalived.conf file that has been uploaded to the containerized keepalived is as given in the k8s HA guide itself.
When I reach the step of configuring the CNI network (https://kubernetes.io/docs/setup/independent/high-availability/#install-cni-network), the flannel-ds pods go into CrashLoopBackoff with the error: "Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-fjn6w': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-fjn6w: dial tcp 10.96.0.1:443: i/o timeout"
What could be the issue beind this? Here are the iptables of the master node on which flannel-ds pod is running:
The flannel pod is trying to retrieve its configuration from the API server using the service-IP 10.96.0.1, which is supposed to get DNAT to node IPs
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-SIIK55AX7MK5ONR7
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-GBLS75FLCCJBNQB6
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-2CDZMOLH2PKAG52U
But I don’t see these rules being triggered at all.
0 0 KUBE-SEP-SIIK55AX7MK5ONR7 all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */ statistic mode random probability 0.33332999982
0 0 KUBE-SEP-GBLS75FLCCJBNQB6 all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */ statistic mode random probability 0.50000000000
0 0 KUBE-SEP-2CDZMOLH2PKAG52U all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */
Curl to the service IP does not work, however a curl request to the kubernetes cluster IP gets a response:
master # curl -k https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/
curl: (7) Failed to connect to 10.96.0.1 port 443: Connection timed out
master # curl -k https://10.106.73.226:6443/api/v1/namespaces/kube-system/pods/
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "pods is forbidden: User \"system:anonymous\" cannot list pods in the namespace \"kube-system\"",
"reason": "Forbidden",
"details": {
"kind": "pods"
},
"code": 403
Also note, the service endpoints have been set correctly to the cluster IP:
master # kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.106.73.226:6443
Session Affinity: ClientIP
Events: <none>
master # kubectl cluster-info
Kubernetes master is running at https://10.106.73.226:6443
KubeDNS is running at https://10.106.73.226:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
I tried adding DNAT iptables to manually map the cluster IP to the service IP, but it didn't seem to help...although I am not sure if I added the rule to the correct iptable chain.
EDIT 1 -- Full iptables
master ~ # iptables -S -t nat
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-N KUBE-MARK-DROP
-N KUBE-MARK-MASQ
-N KUBE-NODEPORTS
-N KUBE-POSTROUTING
-N KUBE-SEP-PE4UL45OLJLNLYYS
-N KUBE-SERVICES
-N KUBE-SVC-NPX46M4PTMTKRN6Y
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A PREROUTING -d 10.96.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.106.73.226:6443
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-PE4UL45OLJLNLYYS -s 10.106.73.226/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-PE4UL45OLJLNLYYS -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-PE4UL45OLJLNLYYS --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.106.73.226:6443
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-PE4UL45OLJLNLYYS --mask 255.255.255.255 --rsource -j KUBE-SEP-PE4UL45OLJLNLYYS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-PE4UL45OLJLNLYYS
NOTE: I added the rule -A PREROUTING -d 10.96.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.106.73.226:6443
manually hoping to map 10.96.0.1 to the apiserver IP, but this did not change the behaviour of the curl requests or that of the flannel pod
The current state of pods on the master:
master ~ # kubectl get pods -o wide --all-namespaces
NAME READY STATUS RESTARTS AGE IP NODE
etcd-master 1/1 Running 0 13d 10.106.73.226 master
kube-apiserver-master 1/1 Running 0 13d 10.106.73.226 master
kube-controller-manager-master 1/1 Running 1 13d 10.106.73.226 master
kube-dns-86f4d74b45-dkzlk 0/3 ContainerCreating 0 13d <none> master
kube-flannel-ds-j5fxd 0/1 CrashLoopBackOff 3550 13d 10.106.73.226 master
kube-proxy-pml47 1/1 Running 0 13d 10.106.73.226 master
kube-scheduler-master 1/1 Running 0 13d 10.106.73.226 master
All your settings look good, including routes and systctl
values.
The only thing I can guess is an issue somewhere inside firewall rules. Please make sure that you accept traffic forwarding in Forward
chain.
You can check it like this:
Write to file /var/lib/iptables/rules-save
content (override if the file exists):
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
Start firewall.
Start docker.
That is the only reason I can imagine why you have the problem with services.