I set up 1 master 2 nodes k8s cluster in according to documentation. A pod can ping the other pod on the same node but can't ping the pod on the other node.
To demonstrate the problem I deployed below deployments which has 3 replica. While two of them sits on the same node, the other pod sits on the other node.
$ cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
name: nginx-svc
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-21-115.us-west-2.compute.internal Ready master 20m v1.11.2
ip-172-31-26-62.us-west-2.compute.internal Ready 19m v1.11.2
ip-172-31-29-204.us-west-2.compute.internal Ready 14m v1.11.2
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-deployment-966857787-22qq7 1/1 Running 0 11m 10.244.2.3 ip-172-31-29-204.us-west-2.compute.internal
nginx-deployment-966857787-lv7dd 1/1 Running 0 11m 10.244.1.2 ip-172-31-26-62.us-west-2.compute.internal
nginx-deployment-966857787-zkzg6 1/1 Running 0 11m 10.244.2.2 ip-172-31-29-204.us-west-2.compute.internal
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 21m
nginx-svc ClusterIP 10.105.205.10 80/TCP 11m
Everything looks fine.
Let me show you containers.
# docker exec -it 489b180f512b /bin/bash
root@nginx-deployment-966857787-zkzg6:/# ifconfig
eth0: flags=4163 mtu 8951
inet 10.244.2.2 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::cc4d:61ff:fe8a:5aeb prefixlen 64 scopeid 0x20
root@nginx-deployment-966857787-zkzg6:/# ping 10.244.2.3
PING 10.244.2.3 (10.244.2.3) 56(84) bytes of data.
64 bytes from 10.244.2.3: icmp\_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.244.2.3: icmp\_seq=2 ttl=64 time=0.055 ms
^C
So it pings its neighbor pod on the same node.
root@nginx-deployment-966857787-zkzg6:/# ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2) 56(84) bytes of data.
^C
--- 10.244.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1059ms
And can't ping its replica on the other node.
Here is host interfaces:
# ifconfig
cni0: flags=4163 mtu 8951
inet 10.244.2.1 netmask 255.255.255.0 broadcast 0.0.0.0
docker0: flags=4099 mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
eth0: flags=4163 mtu 9001
inet 172.31.29.204 netmask 255.255.240.0 broadcast 172.31.31.255
flannel.1: flags=4163 mtu 8951
inet 10.244.2.0 netmask 255.255.255.255 broadcast 0.0.0.0
lo: flags=73 mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
veth09fb984a: flags=4163 mtu 8951
inet6 fe80::d819:14ff:fe06:174c prefixlen 64 scopeid 0x20
veth87b3563e: flags=4163 mtu 8951
inet6 fe80::d09c:d2ff:fe7b:7dd7 prefixlen 64 scopeid 0x20
# ifconfig
cni0: flags=4163 mtu 8951
inet 10.244.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
docker0: flags=4099 mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
eth0: flags=4163 mtu 9001
inet 172.31.26.62 netmask 255.255.240.0 broadcast 172.31.31.255
flannel.1: flags=4163 mtu 8951
inet 10.244.1.0 netmask 255.255.255.255 broadcast 0.0.0.0
lo: flags=73 mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
veth9733e2e6: flags=4163 mtu 8951
inet6 fe80::8003:46ff:fee2:abc2 prefixlen 64 scopeid 0x20
Processes on the nodes:
# ps auxww|grep kube
root 4059 0.1 2.8 43568 28316 ? Ssl 00:31 0:01 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
root 4260 0.0 3.4 358984 34288 ? Ssl 00:31 0:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root 4455 1.1 9.6 760868 97260 ? Ssl 00:31 0:14 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni
Because of this network problem clusterIP is also unreachable:
$ curl 10.105.205.10:80
Any suggestion?
Thanks.
I found the problem.
Flannel uses UDP port 8285 and 8472 which was being blocked by AWS security groups. I had only opened TCP ports.
I enable UDP port 8285 and UDP port 8472 as well as TCP 6443, 10250, 10256.
The docker virtual bridge interface docker0
is now have IP 172.17.0.1
on both host.
But as per the docker/flannel integration guide, the docker0
virtual bridge should be in flannel network on each host.
A highlevel workflow of flannel/docker networking integrations below
/run/flannel/subnet.env
as per the etcd network configuration during flanneld
startup./run/flannel/subnet.env
and set --bip
flag during dockerd
startup and assign IP from flannel network to docker0
Refer docker/flannel integration doc for more details: http://docker-k8s-lab.readthedocs.io/en/latest/docker/docker-flannel.html#restart-docker-daemon-with-flannel-network