Pods not accessible (timeout) on 3 Node cluster created in AWS ec2 from master

5/11/2021

I have 3 node cluster in AWS ec2 (Centos 8 ami).

When I try to access pods scheduled on worker node from master:

kubectl exec -it kube-flannel-ds-amd64-lfzpd -n kube-system /bin/bash
Error from server: error dialing backend: dial tcp 10.41.12.53:10250: i/o timeout

kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
kube-system   coredns-54ff9cd656-8mpbx         1/1     Running   2          7d21h   10.244.0.7     master           <none>           <none>
kube-system   coredns-54ff9cd656-xcxvs         1/1     Running   2          7d21h   10.244.0.6     master           <none>           <none>
kube-system   etcd-master                      1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-apiserver-master            1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-controller-manager-master   1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-flannel-ds-amd64-8zgpw      1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-flannel-ds-amd64-lfzpd      1/1     Running   2          7d21h   10.41.12.53    worker1          <none>           <none>
kube-system   kube-flannel-ds-amd64-nhw5j      1/1     Running   2          7d21h   10.41.15.9     worker3   <none>           <none>
kube-system   kube-flannel-ds-amd64-s6nms      1/1     Running   2          7d21h   10.41.15.188   worker2          <none>           <none>
kube-system   kube-proxy-47s8k                 1/1     Running   2          7d21h   10.41.15.9     worker3   <none>           <none>
kube-system   kube-proxy-6lbvq                 1/1     Running   2          7d21h   10.41.15.188   worker2          <none>           <none>
kube-system   kube-proxy-vhmfp                 1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-proxy-xwsnk                 1/1     Running   2          7d21h   10.41.12.53    worker1          <none>           <none>
kube-system   kube-scheduler-master            1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>

kubectl get nodes
NAME             STATUS   ROLES    AGE     VERSION
master           Ready    master   7d21h   v1.13.10
worker1          Ready    <none>   7d21h   v1.13.10
worker2          Ready    <none>   7d21h   v1.13.10
worker3          Ready    <none>   7d21h   v1.13.10

I tried below steps in all nodes, but no luck so far: 1. iptables -w -P FORWARD ACCEPT on all nodes 2. Turn on Masquerade 3. Turn on port 10250/tcp 4. Turn on port 8472/udp 5. Start kubelet

Any pointer would be helpful.

-- Vikram Ranabhatt
amazon-ec2
centos
kubernetes

2 Answers

5/12/2021

Flannel does not support NFT, and since you are using CentOS 8, you can't fallback to iptables.
Your best bet in this situation would be to switch to Calico.
You have to update Calico DaemonSet with:

....
    Environment:
      FELIX_IPTABLESBACKEND: NFT
....

or use version 3.12 or newer, as it adds
Autodetection of iptables backend

Previous versions of Calico required you to specify the host’s iptables backend (one of NFT or Legacy). With this release, Calico can now autodetect the iptables variant on the host by setting the Felix configuration parameter IptablesBackend to Auto. This is useful in scenarios where you don’t know what the iptables backend might be such as in mixed deployments. For more information, see the documentation for iptables dataplane configuration

Or switch to Ubuntu 20.04. Ubuntu doesn't use nftables yet.

-- p10l
Source: StackOverflow

5/19/2021

Issue was because of inbound port in SG.I added these ports in SG I am able to get pass that issue.

  2222
  24007
  24008
49152-49251

My original installer script does not need to follow above steps while running on VMs and standalone machine. As SG is specific to EC2, so port should be allowed in inbound. Point to note here is all my nodes(master and worker) are on same SG. even then port hast to be opened in inbound rule, that's the way SG works.

-- Vikram Ranabhatt
Source: StackOverflow