Pods on different nodes can't ping each other

8/11/2018

I set up 1 master 2 nodes k8s cluster in according to documentation. A pod can ping the other pod on the same node but can't ping the pod on the other node.

To demonstrate the problem I deployed below deployments which has 3 replica. While two of them sits on the same node, the other pod sits on the other node.

$ cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nginx-svc
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80

$ kubectl get nodes
NAME                                          STATUS    ROLES     AGE       VERSION
ip-172-31-21-115.us-west-2.compute.internal   Ready     master    20m       v1.11.2
ip-172-31-26-62.us-west-2.compute.internal    Ready         19m       v1.11.2
ip-172-31-29-204.us-west-2.compute.internal   Ready         14m       v1.11.2

$ kubectl get pods -o wide
NAME                               READY     STATUS    RESTARTS   AGE       IP           NODE                                          NOMINATED NODE
nginx-deployment-966857787-22qq7   1/1       Running   0          11m       10.244.2.3   ip-172-31-29-204.us-west-2.compute.internal   
nginx-deployment-966857787-lv7dd   1/1       Running   0          11m       10.244.1.2   ip-172-31-26-62.us-west-2.compute.internal    
nginx-deployment-966857787-zkzg6   1/1       Running   0          11m       10.244.2.2   ip-172-31-29-204.us-west-2.compute.internal   

$ kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1               443/TCP   21m
nginx-svc    ClusterIP   10.105.205.10           80/TCP    11m

Everything looks fine.

Let me show you containers.

# docker exec -it 489b180f512b /bin/bash
root@nginx-deployment-966857787-zkzg6:/# ifconfig
eth0: flags=4163  mtu 8951
        inet 10.244.2.2  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::cc4d:61ff:fe8a:5aeb  prefixlen 64  scopeid 0x20

root@nginx-deployment-966857787-zkzg6:/# ping 10.244.2.3
PING 10.244.2.3 (10.244.2.3) 56(84) bytes of data.
64 bytes from 10.244.2.3: icmp\_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.244.2.3: icmp\_seq=2 ttl=64 time=0.055 ms
^C

So it pings its neighbor pod on the same node.

root@nginx-deployment-966857787-zkzg6:/# ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2) 56(84) bytes of data.
^C
--- 10.244.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1059ms

And can't ping its replica on the other node.

Here is host interfaces:

# ifconfig
cni0: flags=4163  mtu 8951
        inet 10.244.2.1  netmask 255.255.255.0  broadcast 0.0.0.0

docker0: flags=4099  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255

eth0: flags=4163  mtu 9001
        inet 172.31.29.204  netmask 255.255.240.0  broadcast 172.31.31.255

flannel.1: flags=4163  mtu 8951
        inet 10.244.2.0  netmask 255.255.255.255  broadcast 0.0.0.0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0

veth09fb984a: flags=4163  mtu 8951
        inet6 fe80::d819:14ff:fe06:174c  prefixlen 64  scopeid 0x20

veth87b3563e: flags=4163  mtu 8951
        inet6 fe80::d09c:d2ff:fe7b:7dd7  prefixlen 64  scopeid 0x20

# ifconfig
cni0: flags=4163  mtu 8951
        inet 10.244.1.1  netmask 255.255.255.0  broadcast 0.0.0.0

docker0: flags=4099  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255

eth0: flags=4163  mtu 9001
        inet 172.31.26.62  netmask 255.255.240.0  broadcast 172.31.31.255

flannel.1: flags=4163  mtu 8951
        inet 10.244.1.0  netmask 255.255.255.255  broadcast 0.0.0.0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0

veth9733e2e6: flags=4163  mtu 8951
        inet6 fe80::8003:46ff:fee2:abc2  prefixlen 64  scopeid 0x20

Processes on the nodes:

# ps auxww|grep kube
root      4059  0.1  2.8  43568 28316 ?        Ssl  00:31   0:01 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
root      4260  0.0  3.4 358984 34288 ?        Ssl  00:31   0:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root      4455  1.1  9.6 760868 97260 ?        Ssl  00:31   0:14 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni

Because of this network problem clusterIP is also unreachable:

$ curl 10.105.205.10:80

Any suggestion?

Thanks.

-- Barry Scott
kubernetes

2 Answers

8/14/2018

I found the problem.

Flannel uses UDP port 8285 and 8472 which was being blocked by AWS security groups. I had only opened TCP ports.

I enable UDP port 8285 and UDP port 8472 as well as TCP 6443, 10250, 10256.

-- Barry Scott
Source: StackOverflow

8/12/2018

The docker virtual bridge interface docker0 is now have IP 172.17.0.1 on both host.

But as per the docker/flannel integration guide, the docker0virtual bridge should be in flannel network on each host.

A highlevel workflow of flannel/docker networking integrations below

  • Flannel creates /run/flannel/subnet.env as per the etcd network configuration during flanneld startup.
  • Docker refers the file /run/flannel/subnet.env and set --bip flag during dockerd startup and assign IP from flannel network to docker0

Refer docker/flannel integration doc for more details: http://docker-k8s-lab.readthedocs.io/en/latest/docker/docker-flannel.html#restart-docker-daemon-with-flannel-network

-- Ansil
Source: StackOverflow