Kubernetes service unreachable from master node on EC2

4/3/2018

I created a k8s cluster on AWS, using kubeadm, with 1 master and 1 worker following the guide available here.

Then, I started 1 ElasticSearch container:

kubectl run elastic --image=elasticsearch:2 --replicas=1

And it was deployed successfully on worker. Then, I try to expose it as a service on cluster:

kubectl expose deploy/elastic --port 9200

And it was exposed successfully:

NAMESPACE     NAME                                                     READY     STATUS    RESTARTS   AGE
default       elastic-664569cb68-flrrz                                 1/1       Running   0          16m
kube-system   etcd-ip-172-31-140-179.ec2.internal                      1/1       Running   0          16m
kube-system   kube-apiserver-ip-172-31-140-179.ec2.internal            1/1       Running   0          16m
kube-system   kube-controller-manager-ip-172-31-140-179.ec2.internal   1/1       Running   0          16m
kube-system   kube-dns-86f4d74b45-mc24s                                3/3       Running   0          17m
kube-system   kube-flannel-ds-fjkkc                                    1/1       Running   0          16m
kube-system   kube-flannel-ds-zw4pq                                    1/1       Running   0          17m
kube-system   kube-proxy-4c8lh                                         1/1       Running   0          17m
kube-system   kube-proxy-zkfwn                                         1/1       Running   0          16m
kube-system   kube-scheduler-ip-172-31-140-179.ec2.internal            1/1       Running   0          16m

NAMESPACE     NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       elastic      ClusterIP   10.96.141.188   <none>        9200/TCP        16m
default       kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP         17m
kube-system   kube-dns     ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP   17m

NAMESPACE     NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
kube-system   kube-flannel-ds   2         2         2         2            2           beta.kubernetes.io/arch=amd64   17m
kube-system   kube-proxy        2         2         2         2            2           <none>                          17m

NAMESPACE     NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       elastic    1         1         1            1           16m
kube-system   kube-dns   1         1         1            1           17m

NAMESPACE     NAME                  DESIRED   CURRENT   READY     AGE
default       elastic-664569cb68    1         1         1         16m
kube-system   kube-dns-86f4d74b45   1         1         1         17m

But, when I try to execute a curl to http://10.96.141.188:9200 (from the master node) I'm getting a timeout, and everything indicates that the generated cluster IP is not reachable from the master node. It's working only on worker node.

I tried everything I could found:

Add a bunch of rules to iptables

iptables -P FORWARD ACCEPT
iptables -I FORWARD 1 -i cni0 -j ACCEPT -m comment --comment "flannel subnet"
iptables -I FORWARD 1 -o cni0 -j ACCEPT -m comment --comment "flannel subnet"
iptables -t nat -A POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE
  • Disable firewalld
  • Enable all ports on ec2 security policy (from everywhere)
  • Use different docker versions (1.13.1, 17.03, 17.06, 17.12)
  • Different k8s versions (1.9.0 ~1.9.6)
  • Differents CNI (flannel and weave)
  • Add some parameters to kubeadm init command (--node-name with FQDN and --apiserver-advertise-address with public master IP)

But none of this worked. It appears that is a specific issue on AWS, since the tutorial guide works fine on Linux Academy Cloud Server.

Is there anything else I could try?

Obs: Currently, I'm using docker 1.13 and k8s 1.9.6 (with flannel 0.9.1) on Centos7.

-- DanielSP
amazon-ec2
kubeadm
kubernetes

2 Answers

4/4/2018

kubectl run elastic --image=elasticsearch:2 --replicas=1

As best I can tell, you did not inform kubernetes that the elasticsearch:2 image listens on any port(s), which it will not infer by itself. You would have experienced the same problem if you had just run that image under docker without similarly specifying the --publish or --publish-all options.

Thus, when the ClusterIP attempts to forward traffic from port 9200 to the Pods matching its selector, those packets fall into /dev/null because the container is not listening for them.

Add a bunch of rules to iptables

Definitely don't do that; if you observed, there are already a ton of iptables rules that are managed by kube-proxy: in fact, its primary job in life is to own the iptables rules on the Node upon which it is running. Your rules only serve to confuse both kube-proxy as well as any person who follows along behind you, trying to work out where those random rules came from. If you haven't already made them permanent, then either undo them or just reboot the machine to flush those tables. Leaving your ad-hoc rules in place will 100% not make your troubleshooting process any easier.

-- mdaniel
Source: StackOverflow

4/4/2018

I finally found the problem. According to this page, Flannel needs to open ports UDP 8285 and 8472 on both Master and Worker node. It's interesting that this is not mentioned at official kubeadm documentation.

-- DanielSP
Source: StackOverflow