Kubernetes HA: Flannel throws SubnetManager error

5/10/2018

I have been following steps provided at https://kubernetes.io/docs/setup/independent/high-availability/ to bring up an HA cluster. I am using CoreOS nodes (VERSION=1688.5.3) and Kubernetes version v1.10.

I have followed the option of running all three etcd on the master nodes. For the Load Balancer, I have used a containerized keepalived as found at https://github.com/alterway/docker-keepalived. The keepalived.conf file that has been uploaded to the containerized keepalived is as given in the k8s HA guide itself.

When I reach the step of configuring the CNI network (https://kubernetes.io/docs/setup/independent/high-availability/#install-cni-network), the flannel-ds pods go into CrashLoopBackoff with the error: "Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-fjn6w': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-fjn6w: dial tcp 10.96.0.1:443: i/o timeout"

What could be the issue beind this? Here are the iptables of the master node on which flannel-ds pod is running:

The flannel pod is trying to retrieve its configuration from the API server using the service-IP 10.96.0.1, which is supposed to get DNAT to node IPs
    -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
    -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-SIIK55AX7MK5ONR7
    -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-GBLS75FLCCJBNQB6
    -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-2CDZMOLH2PKAG52U

But I don’t see these rules being triggered at all.
    0     0 KUBE-SEP-SIIK55AX7MK5ONR7  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/kubernetes:https */ statistic mode random probability 0.33332999982
    0     0 KUBE-SEP-GBLS75FLCCJBNQB6  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/kubernetes:https */ statistic mode random probability 0.50000000000
    0     0 KUBE-SEP-2CDZMOLH2PKAG52U  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/kubernetes:https */


Curl to the service IP does not work, however a curl request to the kubernetes cluster IP gets a response:

    master # curl -k https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/
    curl: (7) Failed to connect to 10.96.0.1 port 443: Connection timed out

    master # curl -k https://10.106.73.226:6443/api/v1/namespaces/kube-system/pods/
    {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {

      },
      "status": "Failure",
      "message": "pods is forbidden: User \"system:anonymous\" cannot list pods in the namespace \"kube-system\"",
      "reason": "Forbidden",
      "details": {
        "kind": "pods"
      },
      "code": 403

Also note, the service endpoints have been set correctly to the cluster IP:

    master # kubectl describe svc kubernetes
    Name:              kubernetes
    Namespace:         default
    Labels:            component=apiserver
                       provider=kubernetes
    Annotations:       <none>
    Selector:          <none>
    Type:              ClusterIP
    IP:                10.96.0.1
    Port:              https  443/TCP
    TargetPort:        6443/TCP
    Endpoints:         10.106.73.226:6443
    Session Affinity:  ClientIP
    Events:            <none>

    master # kubectl cluster-info
    Kubernetes master is running at https://10.106.73.226:6443
    KubeDNS is running at https://10.106.73.226:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

I tried adding DNAT iptables to manually map the cluster IP to the service IP, but it didn't seem to help...although I am not sure if I added the rule to the correct iptable chain.

EDIT 1 -- Full iptables

master ~ # iptables -S -t nat 
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-N KUBE-MARK-DROP
-N KUBE-MARK-MASQ
-N KUBE-NODEPORTS
-N KUBE-POSTROUTING
-N KUBE-SEP-PE4UL45OLJLNLYYS
-N KUBE-SERVICES
-N KUBE-SVC-NPX46M4PTMTKRN6Y
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A PREROUTING -d 10.96.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.106.73.226:6443
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-PE4UL45OLJLNLYYS -s 10.106.73.226/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-PE4UL45OLJLNLYYS -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-PE4UL45OLJLNLYYS --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.106.73.226:6443
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-PE4UL45OLJLNLYYS --mask 255.255.255.255 --rsource -j KUBE-SEP-PE4UL45OLJLNLYYS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-PE4UL45OLJLNLYYS

NOTE: I added the rule -A PREROUTING -d 10.96.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.106.73.226:6443 manually hoping to map 10.96.0.1 to the apiserver IP, but this did not change the behaviour of the curl requests or that of the flannel pod

The current state of pods on the master:

master ~ # kubectl get pods -o wide --all-namespaces
NAME                                  READY     STATUS              RESTARTS   AGE       IP              NODE
etcd-master                           1/1       Running             0          13d       10.106.73.226   master
kube-apiserver-master                 1/1       Running             0          13d       10.106.73.226   master
kube-controller-manager-master        1/1       Running             1          13d       10.106.73.226   master
kube-dns-86f4d74b45-dkzlk             0/3       ContainerCreating   0          13d       <none>          master
kube-flannel-ds-j5fxd                 0/1       CrashLoopBackOff    3550       13d       10.106.73.226   master
kube-proxy-pml47                      1/1       Running             0          13d       10.106.73.226   master
kube-scheduler-master                 1/1       Running             0          13d       10.106.73.226   master
-- Rxth
coreos
flannel
high-availability
kubernetes

1 Answer

5/16/2018

All your settings look good, including routes and systctl values.

The only thing I can guess is an issue somewhere inside firewall rules. Please make sure that you accept traffic forwarding in Forward chain.

You can check it like this:

  1. Stop K8s.
  2. Stop firewall.
  3. Stop docker.
  4. Write to file /var/lib/iptables/rules-save content (override if the file exists):

    *filter
    :INPUT ACCEPT [0:0]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [0:0]
  5. Start firewall.

  6. Start docker.

  7. Start K8s.
  8. Check services.

That is the only reason I can imagine why you have the problem with services.

-- Anton Kostenko
Source: StackOverflow